Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Internal error. org.xerial.snappy.SnappyNative.uncompressedLength

avatar

On loading Hive parquet data in pig using HCatalog, I am facing an issue - Internal error. org.xerial.snappy.SnappyNative.uncompressedLength

My script is :

abc_flow = LOAD 'TEST.abc_flow' using org.apache.hive.hcatalog.pig.HCatLoader(); 
table1 = filter abc_flow by year in ('2011') and month in ('2') and day in ('1'); 
table10 = limit table1 10; 
dump table10; 

I get the following error logs :

ERROR 2998: Unhandled internal error. org.xerial.snappy.SnappyNative.uncompressedLength(Ljava/nio/ByteBuffer;II)I
java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.uncompressedLength(Ljava/nio/ByteBuffer;II)I
    at org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
    at org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:561)
    at org.apache.parquet.hadoop.codec.SnappyDecompressor.decompress(SnappyDecompressor.java:62)
    at org.apache.parquet.hadoop.codec.NonBlockedDecompressorStream.read(NonBlockedDecompressorStream.java:51)
    at java.io.DataInputStream.readFully(DataInputStream.java:195)
    at java.io.DataInputStream.readFully(DataInputStream.java:169)
    at org.apache.parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:204)
    at org.apache.parquet.column.impl.ColumnReaderImpl.readPageV1(ColumnReaderImpl.java:591)
    at org.apache.parquet.column.impl.ColumnReaderImpl.access$300(ColumnReaderImpl.java:60)
    at org.apache.parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:540)
    at org.apache.parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:537)
    at org.apache.parquet.column.page.DataPageV1.accept(DataPageV1.java:96)
    at org.apache.parquet.column.impl.ColumnReaderImpl.readPage(ColumnReaderImpl.java:537)
    at org.apache.parquet.column.impl.ColumnReaderImpl.checkRead(ColumnReaderImpl.java:529)
    at org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:641)
    at org.apache.parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:357)
    at org.apache.parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:82)
    at org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:77)
    at org.apache.parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:270)
    at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:135)
    at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:101)
    at org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:154)
    at org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:101)
    at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:140)
    at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:214)
    at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
    at org.apache.parquet.pig.ParquetLoader.getNext(ParquetLoader.java:230)
    at org.apache.pig.impl.io.ReadToEndLoader.getNextHelper(ReadToEndLoader.java:251)
    at org.apache.pig.impl.io.ReadToEndLoader.getNext(ReadToEndLoader.java:231)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNextTuple(POLoad.java:137)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLimit.getNextTuple(POLimit.java:122)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNextTuple(POStore.java:159)
    at org.apache.pig.backend.hadoop.executionengine.fetch.FetchLauncher.runPipeline(FetchLauncher.java:157)
    at org.apache.pig.backend.hadoop.executionengine.fetch.FetchLauncher.launchPig(FetchLauncher.java:81)
    at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:302)
    at org.apache.pig.PigServer.launchPlan(PigServer.java:1431)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1416)
    at org.apache.pig.PigServer.storeEx(PigServer.java:1075)
    at org.apache.pig.PigServer.store(PigServer.java:1038)
    at org.apache.pig.PigServer.openIterator(PigServer.java:951)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:754)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:376)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
    at org.apache.pig.Main.run(Main.java:631)
    at org.apache.pig.Main.main(Main.java:177)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
1 ACCEPTED SOLUTION

avatar
Contributor

This is due to Snappy version mismatch between Hadoop and Pig.

You can resolve this by executing the below command before loading the Grunt Shell

export HADOOP_USER_CLASSPATH_FIRST=true

To avoid executing the above command everytime before loading a pig grunt shell, you can streamline the above process by adding the above line of configuration in pig-env.sh and deploy the configuration file to the nodes.

View solution in original post

1 REPLY 1

avatar
Contributor

This is due to Snappy version mismatch between Hadoop and Pig.

You can resolve this by executing the below command before loading the Grunt Shell

export HADOOP_USER_CLASSPATH_FIRST=true

To avoid executing the above command everytime before loading a pig grunt shell, you can streamline the above process by adding the above line of configuration in pig-env.sh and deploy the configuration file to the nodes.