Support Questions

Find answers, ask questions, and share your expertise

Spark 2.0 with GPLEXTRAS

avatar

Good Afternoon,

 

We're giving the Spark 2.0 beta a try on a cluster running CDH 5.9 that has GPLEXTRAS deployed. Under Spark 1.6 we haven't noticed any problems, but with 2.0 the RDD interface for reading text files fails because it looks like the Lzo JARs and native libraries (from GPLEXTRAS) aren't on the classpath. For example:

 

scala> sc.textFile("/any/path").count()
java.lang.RuntimeException: Error in configuring object
  at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
  at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
  at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
  at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:185)
  at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:198)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
  at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1933)
  at org.apache.spark.rdd.RDD.count(RDD.scala:1128)
  ... 48 elided
Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
  ... 63 more
Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.
  at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:135)
  at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:175)
  at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)
  ... 68 more
Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found
  at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2105)
  at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:128)
  ... 70 more

 

I'm a bit stumped on how to tackle this. It looks like the SPARK_EXTRA_LIB_PATH and SPARK_DIST_CLASSPATH environment variables are where I should be looking to fix this, but these seem to be managed by CM.

 

Any ideas?

 

Thanks in advance,

 

 - Andrew

1 ACCEPTED SOLUTION

avatar

This problem occurred with beta1. After upgrading to beta2 we no longer see this problem.

View solution in original post

2 REPLIES 2

avatar

This problem occurred with beta1. After upgrading to beta2 we no longer see this problem.

avatar
Community Manager

I am happy to see that the upgrade resolved your issue. Best of luck as you continue with the project. 🙂


Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.