Support Questions

Find answers, ask questions, and share your expertise

Spark2 submit: CDH 6.3.3 using pyspark FAILS

avatar
New Contributor

I am getting below issue with my pyspark program. Here I am trying to cache some tables and that’s when I am getting the issue. Same issue I am getting with Dataframe filewriter API too wherein I am writing a dataframe into a file on hdfs,

 

File "/tmp/bin/loan_acct_financ_calc_module.py", line 145, in main

    spark.sql("cache table {}".format(table))

  File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4153.5460344/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 778, in sql

  File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4153.5460344/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__

  File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4153.5460344/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco

  File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4153.5460344/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value

py4j.protocol.Py4JJavaError: An error occurred while calling o92.sql.

: org.apache.spark.SparkException: Job aborted due to stage failure: Task 15 in stage 0.0 failed 4 times, most recent failure: Lost task 15.3 in stage 0.0 (TID 35, idoop46.devin1.ms.com, executor 5): java.lang.NoClassDefFoundError: Lorg/apache/hadoop/hive/ql/plan/TableDesc;

        at java.lang.Class.getDeclaredFields0(Native Method)

        at java.lang.Class.privateGetDeclaredFields(Class.java:2583)

        at java.lang.Class.getDeclaredField(Class.java:2068)

        at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)

        at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72)

        at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480)

        at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)

        at java.security.AccessController.doPrivileged(Native Method)

        at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:468)

        at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)

        at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)

        at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)

        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)

        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)

        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)

        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)

        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)

        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
1 REPLY 1

avatar
Cloudera Employee

Hi Suresh,

 

With respect to the Error "java.lang.NoClassDefFoundError" seems to be a jar file missing.  Could you please make sure all the relevant jar files are placed in the relevant classpath? Are you running this pyspark program first time? Is it possible to share the command you are using to run the pyspark program?

 

Thanls

AKR