Reply
Explorer
Posts: 23
Registered: ‎12-29-2013

native snappy library not available: SnappyCompressor has not been loaded

I can't seem to get snappy to work.  I am using cloudera's pre-built VM for CDH 4.5.0.  I switched to JDK1.7 though.

 

I am trying to run a map reduce job that uses HBase as a source, and produces mahout vectors in a sequence file.  I want the output to be block compressed using snappy.  For a while I was getting UnsatisfiedLinkErrors because my JRE did not have the snappy*.so files.  Adding those to my classpath did not resolve the issue.  So I did these commands to get the snappy native libraries into my JRE:

 

$ cd /usr/lib/hadoop/lib/native
$ sudo cp *.so /usr/java/latest/jre/lib/amd64/

 

That got me past the UnsatisfiedLinkError, but I am still getting this exception:


Caused by: java.lang.RuntimeException: native snappy library not available: SnappyCompressor has not been loaded.

 

Here is my code:

--------------------------------------------

package jinvestor.jhouse.mr;

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.util.Arrays;
import java.util.List;

import jinvestor.jhouse.core.House;
import jinvestor.jhouse.core.util.HouseAvroUtil;
import jinvestor.jhouse.download.HBaseHouseDAO;

import org.apache.commons.io.IOUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.LocatedFileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.RemoteIterator;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.compress.SnappyCodec;
import org.apache.hadoop.io.compress.snappy.SnappyCompressor;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
import org.apache.mahout.math.DenseVector;
import org.apache.mahout.math.NamedVector;
import org.apache.mahout.math.VectorWritable;

/**
 * Produces mahout vectors from House entries in HBase.
 *
 * @author Michael Scott Knapp
 *
 */
public class HouseVectorizer {

    private final Configuration configuration;
    private final House minimumHouse;
    private final House maximumHouse;

    public HouseVectorizer(final Configuration configuration,
            final House minimumHouse, final House maximumHouse) {
        this.configuration = configuration;
        this.minimumHouse = minimumHouse;
        this.maximumHouse = maximumHouse;
    }

    public void vectorize() throws IOException, ClassNotFoundException, InterruptedException {
        JobConf jobConf = new JobConf();
        jobConf.setMapOutputKeyClass(LongWritable.class);
        jobConf.setMapOutputValueClass(VectorWritable.class);

        // we want the vectors written straight to HDFS,
        // the order does not matter.
        jobConf.setNumReduceTasks(0);

        Path outputDir = new Path("/home/cloudera/house_vectors");
        FileSystem fs = FileSystem.get(configuration);
        if (fs.exists(outputDir)) {
            fs.delete(outputDir, true);
        }

        FileOutputFormat.setOutputPath(jobConf, outputDir);

        // I want the mappers to know the max and min value
        // so they can normalize the data.
        // I will add them as properties in the configuration,
        // by serializing them with avro.
        String minmax = HouseAvroUtil.toBase64String(Arrays.asList(minimumHouse,
                maximumHouse));
        jobConf.set("minmax", minmax);
        jobConf.setCompressMapOutput(true);
        jobConf.setMapOutputCompressorClass(SnappyCodec.class);

        Job job = Job.getInstance(jobConf);
        Scan scan = new Scan();
        scan.addFamily(Bytes.toBytes("data"));
        TableMapReduceUtil.initTableMapperJob("homes", scan,
                HouseVectorizingMapper.class, LongWritable.class,
                VectorWritable.class, job);
        job.setOutputFormatClass(SequenceFileOutputFormat.class);
        job.setOutputKeyClass(LongWritable.class);
        job.setOutputValueClass(VectorWritable.class);
        job.setMapOutputKeyClass(LongWritable.class);
        job.setMapOutputValueClass(VectorWritable.class);
        
        SequenceFileOutputFormat.setOutputPath(job, outputDir);
        
        SequenceFileOutputFormat.setOutputCompressionType(job, SequenceFile.CompressionType.BLOCK);
        
        // if you get an UnsatisfiedLinkError due to this using Snappy, then try the following in a terminal:
//        $ cd /usr/lib/hadoop/lib/native
//        $ sudo cp *.so /usr/java/latest/jre/lib/amd64/
        // if that does not resolve it, then try modifying your mapreduce-site.xml
        // so the SnappyCodec is used everywhere.
        // if that does not resolve it, then switch to the DefaultCodec, but please
        // don't commit that change.
        System.out.println(SnappyCompressor.isNativeCodeLoaded());
        SequenceFileOutputFormat.setOutputCompressorClass(job, SnappyCodec.class);
        job.getConfiguration().setClass("mapreduce.map.output.compress.coded",
                SnappyCodec.class,
                CompressionCodec.class);
        job.waitForCompletion(true);
    }

 

=========================

Here is my exception:

 


java.lang.Exception: java.lang.RuntimeException: native snappy library not available: SnappyCompressor has not been loaded.
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:401)
Caused by: java.lang.RuntimeException: native snappy library not available: SnappyCompressor has not been loaded.
    at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:68)
    at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:127)
    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:104)
    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:118)
    at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1169)
    at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1080)
    at org.apache.hadoop.io.SequenceFile$BlockCompressWriter.<init>(SequenceFile.java:1400)
    at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:274)
    at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:527)
    at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:64)
    at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:75)
    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:617)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:737)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:338)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:233)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

 

What am I missing?

Posts: 1,896
Kudos: 433
Solutions: 303
Registered: ‎07-31-2013

Re: native snappy library not available: SnappyCompressor has not been loaded

I am not sure how you're running your job (trace tells local job runner and a thread pool executor involved) but your JVM can't load native libraries if its not on its library path.

Add the system property -Djava.library.path=/usr/lib/hadoop/lib/native to your JVM spawning command, or simply use 'hadoop jar …'.
New Contributor
Posts: 1
Registered: ‎07-20-2016

Re: native snappy library not available: SnappyCompressor has not been loaded

I am getting below error when executing a spark scala - command in Cloudera VM 5.7.0

scala> sqlContext.sql("select * from departments").count()

16/07/17 07:38:55 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 (TID 1)
java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.


How do I get this resolved in Cloudera VM?

 

I added below properties to mapred-site.xml and restarted the cluster but issue still persists.

<property>
<name>mapreduce.map.output.compress</name>
<value>true</value>
</property>

<property>
<name>mapred.map.output.compress.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>


cloudera says include above property in mapred-site.xml but it does not seem to be working for me..

http://www.cloudera.com/documentation/enterprise/5-6-x/topics/cdh_ig_snappy_mapreduce.html

Highlighted
Expert Contributor
Posts: 68
Registered: ‎10-04-2016

Re: native snappy library not available: SnappyCompressor has not been loaded

I have the same issue on CDH 5.7.x with parcel installation, but not 5.8 quickstart VM (rpm). Must be something missing in the configuration, or the installation.

Announcements

Our community is getting a little larger. And a lot better.


Learn More about the Cloudera and Hortonworks community merger planned for late July and early August.