Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

spark on yarn: java.lang.UnsatisfiedLinkError: ... NativeCodeLoader.buildSupportsSnappy()

avatar
Contributor

Hi all, 

 

I am running a CDH5.2 cluster including Spark on YARN.  When I run jobs through spark-shell with a local driver I am able to read and process Snappy compressed files, however as soon as I try to run the same scripts (wordcount for testing purposes) on YARN I get an UnsatisfiedLinkError (see below):

 

 

 

 

 

java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
        org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method)
        org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
        org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:190)
        org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176)
        org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:110)
        org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
        org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:198)
        org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:189)
        org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:98)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
        org.apache.spark.scheduler.Task.run(Task.scala:54)
        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:180)
        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        java.lang.Thread.run(Thread.java:745)

 I have tried to set the library path to libsnappy.so.1 with a plethora of variables including LD_LIBRARY_PATH, JAVA_LIBRARY_PATH, SPARK_LIBRARY_PATH in spark-env.sh, and hadoop-env.sh, as well as spark.executor.extraLibraryPath, spark.executor.extraClassPath in spark-defaults.conf. 

 

I am at a loss as to what could be causing this problem since running locally works perfectly.

 

Any pointers/ideas would be really helpful.

1 ACCEPTED SOLUTION

avatar
Contributor

The solution I found was to add the following environment variables to spark-env.sh.  The first 2 lines make spark-shell able to read snappy files from when run in local mode and the third makes it possible for spark-shell to read snappy files when in yarn mode.

 

export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/usr/lib/hadoop/lib/native
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/hadoop/lib/native
export SPARK_YARN_USER_ENV="JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH,LD_LIBRARY_PATH=$LD_LIBRARY_PATH"

 

View solution in original post

4 REPLIES 4

avatar
Contributor

The solution I found was to add the following environment variables to spark-env.sh.  The first 2 lines make spark-shell able to read snappy files from when run in local mode and the third makes it possible for spark-shell to read snappy files when in yarn mode.

 

export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/usr/lib/hadoop/lib/native
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/hadoop/lib/native
export SPARK_YARN_USER_ENV="JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH,LD_LIBRARY_PATH=$LD_LIBRARY_PATH"

 

avatar
Explorer

You can include the below in spark-defaults.conf

 

spark.driver.extraLibraryPath   native path($HADOOP_HOME/lib/native/)

avatar
Contributor

I tried that.  It didn't work.

avatar
Visitor

I saw Hadoop load native libraries in the running user's home onto the classpath.  Maybe the same thing is happening to you with Spark.  Check your home for

ls ~/lib*
libhadoop.a       libhadoop.so        libhadooputils.a  libsnappy.so    libsnappy.so.1.1.3
libhadooppipes.a  libhadoop.so.1.0.0  libhdfs.a         libsnappy.so.1

and delete them if found.  I could be totally off, but this was the culprit in our case.