Support Questions

Find answers, ask questions, and share your expertise

Error When Running spark Submit: "native snappy library not available"

avatar
Contributor

Hi All,

I'm getting the following Error when im trying to submit a spark job to read a sequence file.

18/06/07 19:35:25 ERROR Executor: Exception in task 8.0 in stage 16.0 (TID 611)

java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.

at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)

at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:193)

at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:178)

at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1985)

at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1880)

at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1829)

at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1843)

at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:49)

at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64)

at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:251)

at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:250)

at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208)

at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:94)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)

at org.apache.spark.scheduler.Task.run(Task.scala:108)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

Following are my details:

1) Spark 2.2.1

2) Scala 2.11.8

1 ACCEPTED SOLUTION

avatar

@Manikandan Jeyabal

Add the following settings to your custom spark-defaults

spark.driver.extraClassPath=/usr/hdp/current/hadoop-client/lib/snappy*.jar
spark.driver.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native

Also there is another thread with same suggestion here

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

View solution in original post

4 REPLIES 4

avatar
Master Mentor

@Manikandan Jeyabal

Have you installed the following packages in all your cluster nodes?

# yum install snappy snappy-devel

.

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_command-line-installation/content/instal...

avatar
Contributor

It's already installed and the issue resolved now, tanx for you response.

avatar

@Manikandan Jeyabal

Add the following settings to your custom spark-defaults

spark.driver.extraClassPath=/usr/hdp/current/hadoop-client/lib/snappy*.jar
spark.driver.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native

Also there is another thread with same suggestion here

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

avatar
Contributor

Tanx for your help @Felix Albani, it supported me to run without any platform modification