Member since
08-12-2020
3
Posts
0
Kudos Received
0
Solutions
12-08-2020
01:49 PM
Hi, I have permenantly deleted the data. Is this anyway we can recover the data?
... View more
09-02-2020
11:16 AM
I am getting below issue with my pyspark program. Here I am trying to cache some tables and that’s when I am getting the issue. Same issue I am getting with Dataframe filewriter API too wherein I am writing a dataframe into a file on hdfs,
File "/tmp/bin/loan_acct_financ_calc_module.py", line 145, in main
spark.sql("cache table {}".format(table))
File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4153.5460344/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 778, in sql
File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4153.5460344/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4153.5460344/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4153.5460344/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o92.sql.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 15 in stage 0.0 failed 4 times, most recent failure: Lost task 15.3 in stage 0.0 (TID 35, idoop46.devin1.ms.com, executor 5): java.lang.NoClassDefFoundError: Lorg/apache/hadoop/hive/ql/plan/TableDesc;
at java.lang.Class.getDeclaredFields0(Native Method)
at java.lang.Class.privateGetDeclaredFields(Class.java:2583)
at java.lang.Class.getDeclaredField(Class.java:2068)
at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)
at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:468)
at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
... View more
Labels:
08-13-2020
03:02 AM
I am trying to use pandas udfs in my code. Internally it uses apache arrow for the data conversion. I am getting below issue with the pyarrow module despite of me importing it in my app code explicitly. File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p3996.4056429/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 361, in main func, profiler, deserializer, serializer = read_udfs(pickleSer, infile, eval_type) File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p3996.4056429/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 236, in read_udfs arg_offsets, udf = read_single_udf(pickleSer, infile, eval_type, runner_conf) File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p3996.4056429/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 175, in read_single_udf return arg_offsets, wrap_scalar_pandas_udf(func, return_type) File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p3996.4056429/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 84, in wrap_scalar_pandas_udf arrow_return_type = to_arrow_type(return_type) File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p3996.4056429/lib/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 1585, in to_arrow_type import pyarrow as pa ModuleNotFoundError: No module named 'pyarrow' I also tried to manually enable arrow but still no luck spark.conf.set("spark.sql.execution.arrow.enabled", "true")
... View more
Labels:
- Labels:
-
Apache Spark