Created on 11-04-2016 05:39 AM - edited 09-16-2022 03:46 AM
Good Afternoon,
We're giving the Spark 2.0 beta a try on a cluster running CDH 5.9 that has GPLEXTRAS deployed. Under Spark 1.6 we haven't noticed any problems, but with 2.0 the RDD interface for reading text files fails because it looks like the Lzo JARs and native libraries (from GPLEXTRAS) aren't on the classpath. For example:
scala> sc.textFile("/any/path").count() java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:185) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:198) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:246) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:246) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1933) at org.apache.spark.rdd.RDD.count(RDD.scala:1128) ... 48 elided Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 63 more Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found. at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:135) at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:175) at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45) ... 68 more Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2105) at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:128) ... 70 more
I'm a bit stumped on how to tackle this. It looks like the SPARK_EXTRA_LIB_PATH and SPARK_DIST_CLASSPATH environment variables are where I should be looking to fix this, but these seem to be managed by CM.
Any ideas?
Thanks in advance,
- Andrew
Created 11-10-2016 04:11 AM
This problem occurred with beta1. After upgrading to beta2 we no longer see this problem.
Created 11-10-2016 04:11 AM
This problem occurred with beta1. After upgrading to beta2 we no longer see this problem.
Created 11-10-2016 05:14 AM
I am happy to see that the upgrade resolved your issue. Best of luck as you continue with the project. 🙂