Created on 01-21-2021 11:53 AM - edited 09-16-2022 07:40 AM
Hi experts:
The Hadoop version coming with CDH-6.3.4 is Hadoop 3.0.0-cdh6.3.4. The Apache Spark web site does not have a prebuilt tarball for Hadoop 3.0.0, so I downloaded "spark-3.0.1-bin-hadoop3.2.tgz". Untar'red and tried it on our CDH 6.3.4 cluster.
Simple Spark line counting works fine. But in a pyspark session 'show tables' in a hive database working fine, but creating a table fails with an error as:
pyspark.sql.utils.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table messages1. Invalid method name: 'get_table_req';
That is very similar to what is described here:
I tried to replace these hive related jars under Spark 3.0.1 jars subdirectory with the correspondent ones in /opt/cloudera/parcels/CDH-6.3.4-1.cdh6.3.4.p0.6626826/jars, it does not help - failed with different error.
Does anyone have some experience with running Spark 3 in a CDH 6.3.x cluster? Can you suggest anything to try?
Your help is greatly appreciated!
Regards.
Vincent
Created 01-25-2021 08:10 AM
CDH6.3.x supports Spark2.4.0 - https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_63_packaging.html#c...
You may find the CSD + Parcel here:
https://docs.cloudera.com/documentation/spark2/latest/topics/spark2_packaging.html#packaging
Spark 3 is offically supported in CDP 7.1.5 - https://docs.cloudera.com/cdp-private-cloud-base/7.1.5/cds-3/topics/spark-spark-3-overview.html
Created 01-25-2021 06:10 PM
Thanks very much @MyNamesNotRick . We will check them out.