Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Please help ! -> ERROR SnappyCompressor: failed to load SnappyCompressor in sparkcontext

Explorer

I get the following error when I create SparkContext in standalone mode using scala class

 val sparkConfig = new SparkConf()
      .setAppName("test")
      .setMaster("local")
      .set("hive.metastore.uris", "thrift://sandbox.hortonworks.com:9083")
   
val spark = SparkSession.builder()
      .config(sparkConfig)
      .enableHiveSupport()
      .getOrCreate()

val model = PipelineModel.load("snappy model path from hdfs")

18/10/03 17:59:28 ERROR SnappyCompressor: failed to load SnappyCompressor java.lang.NoSuchFieldError: clazz at org.apache.hadoop.io.compress.snappy.SnappyCompressor.initIDs(Native Method) at org.apache.hadoop.io.compress.snappy.SnappyCompressor.<clinit>(SnappyCompressor.java:57) at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:71) at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:195) at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:181) at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:111) at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) at org.apache.spark.rdd.HadoopRDD$anon$1.liftedTree1$1(HadoopRDD.scala:252) at org.apache.spark.rdd.HadoopRDD$anon$1.<init>(HadoopRDD.scala:251) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:211) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 18/10/03 17:59:28 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2) java.lang.RuntimeException: native snappy library not available: SnappyCompressor has not been loaded. at

3 REPLIES 3

Mentor

@sparkhadoop

Yes definitely you are missing the Compression Libraries, you will need to install Install Snappy and LZO execute on all nodes in the cluster

For Linux

sudo yum install snappy snappy-devel 

For Linux

sudo yum install lzo lzo-devel hadooplzo hadooplzo-native 

Check this official HWX install compression Libraries document

Explorer

thanks @Geoffrey Shelton Okot for the reply!

but these are already installed . I am using HDP 2.5 hortonworks sandbox.

Mentor

@sparkhadoop

Can you look for this property in HDFS-->Confifs-->Advanced set value to

io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.SnappyCodec

And restart the stale configs and re-try

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.