Reply
Highlighted
New Contributor
Posts: 1
Registered: ‎07-12-2018

Read snappy files from HDFS (Hive)

Hello guys,

I have a problem with reading snappy files from HDFS.

From the beginning:

1. Files are compressed in Apache NiFi on separate cluster in CompressContent processor.image.png

2. Files are send to HDFS directly from NiFi to 

/test/snappy

3. External Table in Hive is created to read data.

 

CREATE EXTERNAL TABLE test_snappy(
txt string)
LOCATION
'/test/snappy'
;

 

 

4. Simple query:

Select * from test_snappy;

results with 0 rows.

 

5. HDFS -text command returns error:

$ hdfs dfs -text /test/snappy/dummy_text.txt.snappy
18/07/13 08:46:47 INFO compress.CodecPool: Got brand-new decompressor [.snappy]
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:123)
        at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:98)
        at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105)
        at java.io.InputStream.read(InputStream.java:101)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:87)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:121)
        at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:106)
        at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:101)
        at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
        at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
        at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
        at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
        at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:118)
        at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)

 

 

Here is my test file:

https://we.tl/pPUMQU028X

 

Do you have any clues?

Cloudera Employee
Posts: 37
Registered: ‎01-07-2019

Re: Read snappy files from HDFS (Hive)

You are doing a few things here, so it is hard to pinpoint exactly where the problem occurs.

If you suspect the problem occurs due to the compression, please try to do the same thing without compression.

If you suspect the problem occurs due to the way NiFi writes the file, please try to do the same thing while writing the file manually.

Hopefully this mindset allows you to peel off the possible causes and find the exact cause of the problem.

If you identify the cause, hopefully the solution will be clear, but otherwise don't hesitate to ask!
Announcements
New solutions