Support Questions

Find answers, ask questions, and share your expertise

java.lang.RuntimeException: native-lzo library not available Error on CDH 5.3 with Spark 1.2

avatar
New Contributor

I'm unable to get Spark to work with the LZO parcel on 5.3.

 

I've attempted the steps outlined here:

http://hsiamin.com/posts/2014/05/03/enable-lzo-compression-on-hadoop-pig-and-spark/

 

I have also verified that the path to the HADOOP_LZO parcel for both lib and native is listed in the java.library.path (As listed on the Spark Application WEB UI for failed submissions).

 

Any advice would be greatly appreciated. 

4 REPLIES 4

avatar
Master Collaborator

Hi Guys,

 

I'm running into issue where my spark jobs are failing on the below error, I'm using Spark 1.6.0 with CDH 5.13.0.

I tried to figure it out with no success.

Will appreciate any help or a direction how to attack this issue.

User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 3, xxxxxx, executor 1): java.lang.RuntimeException: native-lzo library not available
at com.hadoop.compression.lzo.LzoCodec.getDecompressorType(LzoCodec.java:193)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:181)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1995)
at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1881)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1830)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1844)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:54)
at com.liveperson.dallas.lp.utils.incremental.DallasGenericTextFileRecordReader.initialize(DallasGenericTextFileRecordReader.java:64)
at com.liveperson.hadoop.fs.inputs.LPCombineFileRecordReaderWrapper.initialize(LPCombineFileRecordReaderWrapper.java:38)
at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.initialize(CombineFileRecordReader.java:63)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:168)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:133)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:

 

I see the LZO at GPextras:

ll
total 104
-rw-r--r-- 1 cloudera-scm cloudera-scm 35308 Oct 4 2017 COPYING.hadoop-lzo
-rw-r--r-- 1 cloudera-scm cloudera-scm 62268 Oct 4 2017 hadoop-lzo-0.4.15-cdh5.13.0.jar
lrwxrwxrwx 1 cloudera-scm cloudera-scm 31 May 3 07:23 hadoop-lzo.jar -> hadoop-lzo-0.4.15-cdh5.13.0.jar
drwxr-xr-x 2 cloudera-scm cloudera-scm 4096 Oct 4 2017 native

 

i see only lzo only for impala

[root@xxxxxxx ~]# locate *lzo*.so*
/opt/cloudera/parcels/GPLEXTRAS-5.13.0-1.cdh5.13.0.p0.29/lib/impala/lib/libimpalalzo.so
/usr/lib64/liblzo2.so.2
/usr/lib64/liblzo2.so.2.0.0

the /opt/cloudera/parcels/GPLEXTRAS-5.13.0-1.cdh5.13.0.p0.29/lib/hadoop/lib/native has :

-rwxr-xr-x 1 cloudera-scm cloudera-scm 22918 Oct 4 2017 libgplcompression.a
-rwxr-xr-x 1 cloudera-scm cloudera-scm 1204 Oct 4 2017 libgplcompression.la
-rwxr-xr-x 1 cloudera-scm cloudera-scm 1205 Oct 4 2017 libgplcompression.lai
-rwxr-xr-x 1 cloudera-scm cloudera-scm 15760 Oct 4 2017 libgplcompression.so
-rwxr-xr-x 1 cloudera-scm cloudera-scm 15768 Oct 4 2017 libgplcompression.so.0
-rwxr-xr-x 1 cloudera-scm cloudera-scm 15768 Oct 4 2017 libgplcompression.so.0.0.0


and /opt/cloudera/parcels/GPLEXTRAS-5.13.0-1.cdh5.13.0.p0.29/lib/spark-netlib/lib has:

-rw-r--r-- 1 cloudera-scm cloudera-scm 8673 Oct 4 2017 jniloader-1.1.jar
-rw-r--r-- 1 cloudera-scm cloudera-scm 53249 Oct 4 2017 native_ref-java-1.1.jar
-rw-r--r-- 1 cloudera-scm cloudera-scm 53295 Oct 4 2017 native_system-java-1.1.jar
-rw-r--r-- 1 cloudera-scm cloudera-scm 1732268 Oct 4 2017 netlib-native_ref-linux-x86_64-1.1-natives.jar
-rw-r--r-- 1 cloudera-scm cloudera-scm 446694 Oct 4 2017 netlib-native_system-linux-x86_64-1.1-natives.jar


Note: The issue occuring only with the spark job, mapreduce job working fine.

avatar
Master Collaborator

@GeKas Can you have a quick look here and help please.

avatar
Super Collaborator

To be honest, I have not used lzo in spark.

I suppose that you have Spark running under yarn and not stand-alone.

In that case, the first thing I would check, is that lzo is configured in YARN available codecs "io.compression.codecs". Moreover, have you configured HDFS https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_gpl_extras.html#xd_583c10bfdbd...

avatar
Master Collaborator

Thanks all and specially @GeKas < just to update that i was able to solve the issue, it was some of the lefover of enabling keberos on the cluster, i was install the oracle JDK which installed java1.7_cloudera, once i removed this package from the node, the LZO error gone.