Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Unable to use a simple command on data loaded from HDFS on spark 1.6

Highlighted

Unable to use a simple command on data loaded from HDFS on spark 1.6

New Contributor

im using spark 1.6 with python (jupyter notebook) ,im trying to load data from HDFS and work on it.my code :

lines=sc.textFile("hdfs:///user/maria_dev/oil.csv")
lines.take(2)

i get this error :


Py4JJavaError: An error occurred while calling o36.partitions. : java.lang.RuntimeException: Error in configuring object     at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)     at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)     at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)     at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:185)     at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:198)     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)     at scala.Option.getOrElse(Option.scala:120)     at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)     at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)     at scala.Option.getOrElse(Option.scala:120)     at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)     at org.apache.spark.api.java.JavaRDDLike$class.partitions(JavaRDDLike.scala:64)     at org.apache.spark.api.java.AbstractJavaRDDLike.partitions(JavaRDDLike.scala:46)     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)     at java.lang.reflect.Method.invoke(Method.java:498)     at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)     at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)     at py4j.Gateway.invoke(Gateway.java:259)     at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)     at py4j.commands.CallCommand.execute(CallCommand.java:79)     at py4j.GatewayConnection.run(GatewayConnection.java:209)     at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.reflect.InvocationTargetException     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)     at java.lang.reflect.Method.invoke(Method.java:498)     at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)     ... 26 more Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.     at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:139)     at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:179)     at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)     ... 31 more Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found     at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2114)     at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132)     ... 33 more