Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Error When Running spark Submit: "native snappy library not available"

avatar

Hi All,

I'm getting the following Error when im trying to submit a spark job to read a sequence file.

18/06/07 19:35:25 ERROR Executor: Exception in task 8.0 in stage 16.0 (TID 611)

java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.

at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)

at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:193)

at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:178)

at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1985)

at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1880)

at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1829)

at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1843)

at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:49)

at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64)

at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:251)

at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:250)

at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208)

at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:94)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)

at org.apache.spark.scheduler.Task.run(Task.scala:108)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

Following are my details:

1) Spark 2.2.1

2) Scala 2.11.8

1 ACCEPTED SOLUTION

avatar

@Manikandan Jeyabal

Add the following settings to your custom spark-defaults

spark.driver.extraClassPath=/usr/hdp/current/hadoop-client/lib/snappy*.jar
spark.driver.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native

Also there is another thread with same suggestion here

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

View solution in original post

4 REPLIES 4

avatar
Master Mentor

@Manikandan Jeyabal

Have you installed the following packages in all your cluster nodes?

# yum install snappy snappy-devel

.

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_command-line-installation/content/instal...

avatar

It's already installed and the issue resolved now, tanx for you response.

avatar

@Manikandan Jeyabal

Add the following settings to your custom spark-defaults

spark.driver.extraClassPath=/usr/hdp/current/hadoop-client/lib/snappy*.jar
spark.driver.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native

Also there is another thread with same suggestion here

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

avatar

Tanx for your help @Felix Albani, it supported me to run without any platform modification