Created 10-04-2018 06:58 AM
I get the following error when I create SparkContext in standalone mode using scala class
val sparkConfig = new SparkConf() .setAppName("test") .setMaster("local") .set("hive.metastore.uris", "thrift://sandbox.hortonworks.com:9083") val spark = SparkSession.builder() .config(sparkConfig) .enableHiveSupport() .getOrCreate() val model = PipelineModel.load("snappy model path from hdfs")
18/10/03 17:59:28 ERROR SnappyCompressor: failed to load SnappyCompressor java.lang.NoSuchFieldError: clazz at org.apache.hadoop.io.compress.snappy.SnappyCompressor.initIDs(Native Method) at org.apache.hadoop.io.compress.snappy.SnappyCompressor.<clinit>(SnappyCompressor.java:57) at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:71) at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:195) at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:181) at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:111) at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) at org.apache.spark.rdd.HadoopRDD$anon$1.liftedTree1$1(HadoopRDD.scala:252) at org.apache.spark.rdd.HadoopRDD$anon$1.<init>(HadoopRDD.scala:251) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:211) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 18/10/03 17:59:28 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2) java.lang.RuntimeException: native snappy library not available: SnappyCompressor has not been loaded. at
Created 10-04-2018 06:58 AM
Yes definitely you are missing the Compression Libraries, you will need to install Install Snappy and LZO execute on all nodes in the cluster
For Linux
sudo yum install snappy snappy-devel
For Linux
sudo yum install lzo lzo-devel hadooplzo hadooplzo-native
Check this official HWX install compression Libraries document
Created 10-04-2018 06:58 AM
thanks @Geoffrey Shelton Okot for the reply!
but these are already installed . I am using HDP 2.5 hortonworks sandbox.
Created 10-04-2018 08:06 AM
Can you look for this property in HDFS-->Confifs-->Advanced set value to
io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.SnappyCodec
And restart the stale configs and re-try