Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

spark on yarn: java.lang.UnsatisfiedLinkError: ... NativeCodeLoader.buildSupportsSnappy()

avatar
Contributor

Hi all, 

 

I am running a CDH5.2 cluster including Spark on YARN.  When I run jobs through spark-shell with a local driver I am able to read and process Snappy compressed files, however as soon as I try to run the same scripts (wordcount for testing purposes) on YARN I get an UnsatisfiedLinkError (see below):

 

 

 

 

 

java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
        org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method)
        org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
        org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:190)
        org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176)
        org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:110)
        org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
        org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:198)
        org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:189)
        org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:98)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
        org.apache.spark.scheduler.Task.run(Task.scala:54)
        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:180)
        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        java.lang.Thread.run(Thread.java:745)

 I have tried to set the library path to libsnappy.so.1 with a plethora of variables including LD_LIBRARY_PATH, JAVA_LIBRARY_PATH, SPARK_LIBRARY_PATH in spark-env.sh, and hadoop-env.sh, as well as spark.executor.extraLibraryPath, spark.executor.extraClassPath in spark-defaults.conf. 

 

I am at a loss as to what could be causing this problem since running locally works perfectly.

 

Any pointers/ideas would be really helpful.

1 ACCEPTED SOLUTION

avatar
Contributor

The solution I found was to add the following environment variables to spark-env.sh.  The first 2 lines make spark-shell able to read snappy files from when run in local mode and the third makes it possible for spark-shell to read snappy files when in yarn mode.

 

export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/usr/lib/hadoop/lib/native
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/hadoop/lib/native
export SPARK_YARN_USER_ENV="JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH,LD_LIBRARY_PATH=$LD_LIBRARY_PATH"

 

View solution in original post

4 REPLIES 4

avatar
Contributor

The solution I found was to add the following environment variables to spark-env.sh.  The first 2 lines make spark-shell able to read snappy files from when run in local mode and the third makes it possible for spark-shell to read snappy files when in yarn mode.

 

export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/usr/lib/hadoop/lib/native
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/hadoop/lib/native
export SPARK_YARN_USER_ENV="JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH,LD_LIBRARY_PATH=$LD_LIBRARY_PATH"

 

avatar
Explorer

You can include the below in spark-defaults.conf

 

spark.driver.extraLibraryPath   native path($HADOOP_HOME/lib/native/)

avatar
Contributor

I tried that.  It didn't work.

avatar
New Contributor

I saw Hadoop load native libraries in the running user's home onto the classpath.  Maybe the same thing is happening to you with Spark.  Check your home for

ls ~/lib*
libhadoop.a       libhadoop.so        libhadooputils.a  libsnappy.so    libsnappy.so.1.1.3
libhadooppipes.a  libhadoop.so.1.0.0  libhdfs.a         libsnappy.so.1

and delete them if found.  I could be totally off, but this was the culprit in our case.