- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
spark on yarn: java.lang.UnsatisfiedLinkError: ... NativeCodeLoader.buildSupportsSnappy()
- Labels:
-
Apache Hadoop
-
Apache Spark
-
Apache YARN
Created on ‎12-16-2014 11:03 AM - edited ‎09-16-2022 02:15 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I am running a CDH5.2 cluster including Spark on YARN. When I run jobs through spark-shell with a local driver I am able to read and process Snappy compressed files, however as soon as I try to run the same scripts (wordcount for testing purposes) on YARN I get an UnsatisfiedLinkError (see below):
java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method) org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63) org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:190) org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176) org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:110) org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:198) org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:189) org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:98) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:180) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745)
I have tried to set the library path to libsnappy.so.1 with a plethora of variables including LD_LIBRARY_PATH, JAVA_LIBRARY_PATH, SPARK_LIBRARY_PATH in spark-env.sh, and hadoop-env.sh, as well as spark.executor.extraLibraryPath, spark.executor.extraClassPath in spark-defaults.conf.
I am at a loss as to what could be causing this problem since running locally works perfectly.
Any pointers/ideas would be really helpful.
Created ‎12-16-2014 03:25 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The solution I found was to add the following environment variables to spark-env.sh. The first 2 lines make spark-shell able to read snappy files from when run in local mode and the third makes it possible for spark-shell to read snappy files when in yarn mode.
export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/usr/lib/hadoop/lib/native export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/hadoop/lib/native export SPARK_YARN_USER_ENV="JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH,LD_LIBRARY_PATH=$LD_LIBRARY_PATH"
Created ‎12-16-2014 03:25 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The solution I found was to add the following environment variables to spark-env.sh. The first 2 lines make spark-shell able to read snappy files from when run in local mode and the third makes it possible for spark-shell to read snappy files when in yarn mode.
export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/usr/lib/hadoop/lib/native export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/hadoop/lib/native export SPARK_YARN_USER_ENV="JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH,LD_LIBRARY_PATH=$LD_LIBRARY_PATH"
Created ‎12-23-2014 02:54 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can include the below in spark-defaults.conf
spark.driver.extraLibraryPath native path($HADOOP_HOME/lib/native/)
Created ‎12-23-2014 06:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried that. It didn't work.
Created ‎05-05-2015 01:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I saw Hadoop load native libraries in the running user's home onto the classpath. Maybe the same thing is happening to you with Spark. Check your home for
ls ~/lib* libhadoop.a libhadoop.so libhadooputils.a libsnappy.so libsnappy.so.1.1.3 libhadooppipes.a libhadoop.so.1.0.0 libhdfs.a libsnappy.so.1
and delete them if found. I could be totally off, but this was the culprit in our case.
