Support Questions
Find answers, ask questions, and share your expertise

Pig using AvroStorage -Snappy compression doesnt work

Pig using AvroStorage -Snappy compression doesnt work

Expert Contributor

i tries to create AVRO file using snappy compression without success, any idea?

I set these variables to enable snappy compression:

SET mapred.output.compress true
SET mapred.output.compression.codec org.apache.hadoop.io.compress.SnappyCodec
SET avro.output.codec snappy

Error below

java.lang.Exception: java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
        at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
        at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method)
        at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316)
        at org.apache.avro.file.SnappyCodec.compress(SnappyCodec.java:43)
        at org.apache.avro.file.DataFileStream$DataBlock.compressUsing(DataFileStream.java:349)
        at org.apache.avro.file.DataFileWriter.writeBlock(DataFileWriter.java:348)
        at org.apache.avro.file.DataFileWriter.sync(DataFileWriter.java:360)
        at org.apache.avro.file.DataFileWriter.flush(DataFileWriter.java:367)
        at org.apache.avro.file.DataFileWriter.close(DataFileWriter.java:375)
        at org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.close(PigAvroRecordWriter.java:44)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.close(PigOutputFormat.java:146)
        at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:670)
        at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:2019)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:797)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

http://stackoverflow.com/questions/34883866/avrostorage-snappy-conmpression

5 REPLIES 5

Re: Pig using AvroStorage -Snappy compression doesnt work

Mentor

@John Smith

You're using old mapred api. Search for mapreduce api pig properties for compression.

Re: Pig using AvroStorage -Snappy compression doesnt work

Expert Contributor

looks like only those commands should be used.

SET mapreduce.map.output.compress; SET mapred.map.output.compress.codec org.apache.hadoop.io.compress.SnappyCodec;

http://www.cloudera.com/content/www/en-us/documentation/archive/cdh/4-x/4-3-0/CDH4-Installation-Guid...

Re: Pig using AvroStorage -Snappy compression doesnt work

Expert Contributor

but i cant see any compression when directly looking into output avro file

Re: Pig using AvroStorage -Snappy compression doesnt work

Mentor

@you're using intermediate compression so only output from map stage is being compressed, for performance that's great, what you need to do now is set output compression not map.output compression. That will tell pig to compress the final output from reducer. @John Smith Here's an example.

Re: Pig using AvroStorage -Snappy compression doesnt work

Mentor

@John Smith did it work for you? If so, can you please accept one of the answers to close out the thread?