Created on 06-11-2018 01:01 PM - edited 09-16-2022 06:19 AM
Hi All,
I'm getting the following Error when im trying to submit a spark job to read a sequence file.
18/06/07 19:35:25 ERROR Executor: Exception in task 8.0 in stage 16.0 (TID 611)
java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:193)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:178)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1985)
at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1880)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1829)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1843)
at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:49)
at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64)
at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:251)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:250)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:94)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
Following are my details:
1) Spark 2.2.1
2) Scala 2.11.8
Created 06-11-2018 01:45 PM
Add the following settings to your custom spark-defaults
spark.driver.extraClassPath=/usr/hdp/current/hadoop-client/lib/snappy*.jar spark.driver.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native
Also there is another thread with same suggestion here
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
Created 06-11-2018 01:38 PM
Have you installed the following packages in all your cluster nodes?
# yum install snappy snappy-devel
.
Created 06-13-2018 05:21 PM
It's already installed and the issue resolved now, tanx for you response.
Created 06-11-2018 01:45 PM
Add the following settings to your custom spark-defaults
spark.driver.extraClassPath=/usr/hdp/current/hadoop-client/lib/snappy*.jar spark.driver.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native
Also there is another thread with same suggestion here
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
Created 06-13-2018 05:19 PM
Tanx for your help @Felix Albani, it supported me to run without any platform modification