Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

spark on yarn: java.lang.UnsatisfiedLinkError: ... NativeCodeLoader.buildSupportsSnappy()

SOLVED Go to solution

spark on yarn: java.lang.UnsatisfiedLinkError: ... NativeCodeLoader.buildSupportsSnappy()

Explorer

Hi all, 

 

I am running a CDH5.2 cluster including Spark on YARN.  When I run jobs through spark-shell with a local driver I am able to read and process Snappy compressed files, however as soon as I try to run the same scripts (wordcount for testing purposes) on YARN I get an UnsatisfiedLinkError (see below):

 

 

 

 

 

java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
        org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method)
        org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
        org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:190)
        org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176)
        org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:110)
        org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
        org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:198)
        org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:189)
        org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:98)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
        org.apache.spark.scheduler.Task.run(Task.scala:54)
        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:180)
        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        java.lang.Thread.run(Thread.java:745)

 I have tried to set the library path to libsnappy.so.1 with a plethora of variables including LD_LIBRARY_PATH, JAVA_LIBRARY_PATH, SPARK_LIBRARY_PATH in spark-env.sh, and hadoop-env.sh, as well as spark.executor.extraLibraryPath, spark.executor.extraClassPath in spark-defaults.conf. 

 

I am at a loss as to what could be causing this problem since running locally works perfectly.

 

Any pointers/ideas would be really helpful.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: spark on yarn: java.lang.UnsatisfiedLinkError: ... NativeCodeLoader.buildSupportsSnappy()

Explorer

The solution I found was to add the following environment variables to spark-env.sh.  The first 2 lines make spark-shell able to read snappy files from when run in local mode and the third makes it possible for spark-shell to read snappy files when in yarn mode.

 

export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/usr/lib/hadoop/lib/native
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/hadoop/lib/native
export SPARK_YARN_USER_ENV="JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH,LD_LIBRARY_PATH=$LD_LIBRARY_PATH"

 

4 REPLIES 4

Re: spark on yarn: java.lang.UnsatisfiedLinkError: ... NativeCodeLoader.buildSupportsSnappy()

Explorer

The solution I found was to add the following environment variables to spark-env.sh.  The first 2 lines make spark-shell able to read snappy files from when run in local mode and the third makes it possible for spark-shell to read snappy files when in yarn mode.

 

export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/usr/lib/hadoop/lib/native
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/hadoop/lib/native
export SPARK_YARN_USER_ENV="JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH,LD_LIBRARY_PATH=$LD_LIBRARY_PATH"

 

Re: spark on yarn: java.lang.UnsatisfiedLinkError: ... NativeCodeLoader.buildSupportsSnappy()

Explorer

You can include the below in spark-defaults.conf

 

spark.driver.extraLibraryPath   native path($HADOOP_HOME/lib/native/)

Re: spark on yarn: java.lang.UnsatisfiedLinkError: ... NativeCodeLoader.buildSupportsSnappy()

Explorer

I tried that.  It didn't work.

Highlighted

Re: spark on yarn: java.lang.UnsatisfiedLinkError: ... NativeCodeLoader.buildSupportsSnappy()

New Contributor

I saw Hadoop load native libraries in the running user's home onto the classpath.  Maybe the same thing is happening to you with Spark.  Check your home for

ls ~/lib*
libhadoop.a       libhadoop.so        libhadooputils.a  libsnappy.so    libsnappy.so.1.1.3
libhadooppipes.a  libhadoop.so.1.0.0  libhdfs.a         libsnappy.so.1

and delete them if found.  I could be totally off, but this was the culprit in our case.