- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Spark2 submit: CDH 6.3.3 using pyspark FAILS
Created on 09-02-2020 11:16 AM - edited 09-16-2022 07:38 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am getting below issue with my pyspark program. Here I am trying to cache some tables and that’s when I am getting the issue. Same issue I am getting with Dataframe filewriter API too wherein I am writing a dataframe into a file on hdfs,
File "/tmp/bin/loan_acct_financ_calc_module.py", line 145, in main
spark.sql("cache table {}".format(table))
File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4153.5460344/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 778, in sql
File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4153.5460344/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4153.5460344/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4153.5460344/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o92.sql.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 15 in stage 0.0 failed 4 times, most recent failure: Lost task 15.3 in stage 0.0 (TID 35, idoop46.devin1.ms.com, executor 5): java.lang.NoClassDefFoundError: Lorg/apache/hadoop/hive/ql/plan/TableDesc;
at java.lang.Class.getDeclaredFields0(Native Method)
at java.lang.Class.privateGetDeclaredFields(Class.java:2583)
at java.lang.Class.getDeclaredField(Class.java:2068)
at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)
at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:468)
at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
Created 09-11-2020 02:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Suresh,
With respect to the Error "java.lang.NoClassDefFoundError" seems to be a jar file missing. Could you please make sure all the relevant jar files are placed in the relevant classpath? Are you running this pyspark program first time? Is it possible to share the command you are using to run the pyspark program?
Thanls
AKR
