Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Read snappy files from HDFS (Hive)

Read snappy files from HDFS (Hive)

New Contributor

Hello guys,

I have a problem with reading snappy files from HDFS.

From the beginning:

1. Files are compressed in Apache NiFi on separate cluster in CompressContent processor.image.png

2. Files are send to HDFS directly from NiFi to 

/test/snappy

3. External Table in Hive is created to read data.

 

CREATE EXTERNAL TABLE test_snappy(
txt string)
LOCATION
'/test/snappy'
;

 

 

4. Simple query:

Select * from test_snappy;

results with 0 rows.

 

5. HDFS -text command returns error:

$ hdfs dfs -text /test/snappy/dummy_text.txt.snappy
18/07/13 08:46:47 INFO compress.CodecPool: Got brand-new decompressor [.snappy]
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:123)
        at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:98)
        at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105)
        at java.io.InputStream.read(InputStream.java:101)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:87)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:121)
        at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:106)
        at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:101)
        at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
        at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
        at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
        at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
        at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:118)
        at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)

 

 

Here is my test file:

https://we.tl/pPUMQU028X

 

Do you have any clues?

1 REPLY 1

Re: Read snappy files from HDFS (Hive)

Rising Star
You are doing a few things here, so it is hard to pinpoint exactly where the problem occurs.

If you suspect the problem occurs due to the compression, please try to do the same thing without compression.

If you suspect the problem occurs due to the way NiFi writes the file, please try to do the same thing while writing the file manually.

Hopefully this mindset allows you to peel off the possible causes and find the exact cause of the problem.

If you identify the cause, hopefully the solution will be clear, but otherwise don't hesitate to ask!