Reply
Explorer
Posts: 23
Registered: ‎12-29-2013

native snappy library not available: SnappyCompressor has not been loaded

I can't seem to get snappy to work.  I am using cloudera's pre-built VM for CDH 4.5.0.  I switched to JDK1.7 though.

 

I am trying to run a map reduce job that uses HBase as a source, and produces mahout vectors in a sequence file.  I want the output to be block compressed using snappy.  For a while I was getting UnsatisfiedLinkErrors because my JRE did not have the snappy*.so files.  Adding those to my classpath did not resolve the issue.  So I did these commands to get the snappy native libraries into my JRE:

 

$ cd /usr/lib/hadoop/lib/native
$ sudo cp *.so /usr/java/latest/jre/lib/amd64/

 

That got me past the UnsatisfiedLinkError, but I am still getting this exception:


Caused by: java.lang.RuntimeException: native snappy library not available: SnappyCompressor has not been loaded.

 

Here is my code:

--------------------------------------------

package jinvestor.jhouse.mr;

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.util.Arrays;
import java.util.List;

import jinvestor.jhouse.core.House;
import jinvestor.jhouse.core.util.HouseAvroUtil;
import jinvestor.jhouse.download.HBaseHouseDAO;

import org.apache.commons.io.IOUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.LocatedFileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.RemoteIterator;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.compress.SnappyCodec;
import org.apache.hadoop.io.compress.snappy.SnappyCompressor;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
import org.apache.mahout.math.DenseVector;
import org.apache.mahout.math.NamedVector;
import org.apache.mahout.math.VectorWritable;

/**
 * Produces mahout vectors from House entries in HBase.
 *
 * @author Michael Scott Knapp
 *
 */
public class HouseVectorizer {

    private final Configuration configuration;
    private final House minimumHouse;
    private final House maximumHouse;

    public HouseVectorizer(final Configuration configuration,
            final House minimumHouse, final House maximumHouse) {
        this.configuration = configuration;
        this.minimumHouse = minimumHouse;
        this.maximumHouse = maximumHouse;
    }

    public void vectorize() throws IOException, ClassNotFoundException, InterruptedException {
        JobConf jobConf = new JobConf();
        jobConf.setMapOutputKeyClass(LongWritable.class);
        jobConf.setMapOutputValueClass(VectorWritable.class);

        // we want the vectors written straight to HDFS,
        // the order does not matter.
        jobConf.setNumReduceTasks(0);

        Path outputDir = new Path("/home/cloudera/house_vectors");
        FileSystem fs = FileSystem.get(configuration);
        if (fs.exists(outputDir)) {
            fs.delete(outputDir, true);
        }

        FileOutputFormat.setOutputPath(jobConf, outputDir);

        // I want the mappers to know the max and min value
        // so they can normalize the data.
        // I will add them as properties in the configuration,
        // by serializing them with avro.
        String minmax = HouseAvroUtil.toBase64String(Arrays.asList(minimumHouse,
                maximumHouse));
        jobConf.set("minmax", minmax);
        jobConf.setCompressMapOutput(true);
        jobConf.setMapOutputCompressorClass(SnappyCodec.class);

        Job job = Job.getInstance(jobConf);
        Scan scan = new Scan();
        scan.addFamily(Bytes.toBytes("data"));
        TableMapReduceUtil.initTableMapperJob("homes", scan,
                HouseVectorizingMapper.class, LongWritable.class,
                VectorWritable.class, job);
        job.setOutputFormatClass(SequenceFileOutputFormat.class);
        job.setOutputKeyClass(LongWritable.class);
        job.setOutputValueClass(VectorWritable.class);
        job.setMapOutputKeyClass(LongWritable.class);
        job.setMapOutputValueClass(VectorWritable.class);
        
        SequenceFileOutputFormat.setOutputPath(job, outputDir);
        
        SequenceFileOutputFormat.setOutputCompressionType(job, SequenceFile.CompressionType.BLOCK);
        
        // if you get an UnsatisfiedLinkError due to this using Snappy, then try the following in a terminal:
//        $ cd /usr/lib/hadoop/lib/native
//        $ sudo cp *.so /usr/java/latest/jre/lib/amd64/
        // if that does not resolve it, then try modifying your mapreduce-site.xml
        // so the SnappyCodec is used everywhere.
        // if that does not resolve it, then switch to the DefaultCodec, but please
        // don't commit that change.
        System.out.println(SnappyCompressor.isNativeCodeLoaded());
        SequenceFileOutputFormat.setOutputCompressorClass(job, SnappyCodec.class);
        job.getConfiguration().setClass("mapreduce.map.output.compress.coded",
                SnappyCodec.class,
                CompressionCodec.class);
        job.waitForCompletion(true);
    }

 

=========================

Here is my exception:

 


java.lang.Exception: java.lang.RuntimeException: native snappy library not available: SnappyCompressor has not been loaded.
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:401)
Caused by: java.lang.RuntimeException: native snappy library not available: SnappyCompressor has not been loaded.
    at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:68)
    at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:127)
    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:104)
    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:118)
    at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1169)
    at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1080)
    at org.apache.hadoop.io.SequenceFile$BlockCompressWriter.<init>(SequenceFile.java:1400)
    at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:274)
    at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:527)
    at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:64)
    at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:75)
    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:617)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:737)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:338)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:233)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

 

What am I missing?

Highlighted
Posts: 1,892
Kudos: 432
Solutions: 302
Registered: ‎07-31-2013

Re: native snappy library not available: SnappyCompressor has not been loaded

I am not sure how you're running your job (trace tells local job runner and a thread pool executor involved) but your JVM can't load native libraries if its not on its library path.

Add the system property -Djava.library.path=/usr/lib/hadoop/lib/native to your JVM spawning command, or simply use 'hadoop jar …'.
New Contributor
Posts: 1
Registered: ‎07-20-2016

Re: native snappy library not available: SnappyCompressor has not been loaded

I am getting below error when executing a spark scala - command in Cloudera VM 5.7.0

scala> sqlContext.sql("select * from departments").count()

16/07/17 07:38:55 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 (TID 1)
java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.


How do I get this resolved in Cloudera VM?

 

I added below properties to mapred-site.xml and restarted the cluster but issue still persists.

<property>
<name>mapreduce.map.output.compress</name>
<value>true</value>
</property>

<property>
<name>mapred.map.output.compress.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>


cloudera says include above property in mapred-site.xml but it does not seem to be working for me..

http://www.cloudera.com/documentation/enterprise/5-6-x/topics/cdh_ig_snappy_mapreduce.html

Expert Contributor
Posts: 68
Registered: ‎10-04-2016

Re: native snappy library not available: SnappyCompressor has not been loaded

I have the same issue on CDH 5.7.x with parcel installation, but not 5.8 quickstart VM (rpm). Must be something missing in the configuration, or the installation.

Announcements