Hello team,
We have a CDh cluster in cloud with version 6.2 and On Prem cluster at CDH 5.16, Kindly check below and suggest?
We able to list HDFS content from Cloud Gateway Node but while running pyspark from cloud VM it gets failed.
I have copied spark, hdfs, yarn config copied from On prem cluster to cloud Gatway node in different path and export that path using below.
Step1:
export SPARK_CONF_DIR=/app/localstorage/evl_prod/etc/spark2/conf.cloudera.spark2_on_yarn
export SPARK_DIST_CLASSPATH=$(hadoop --config /app/localstorage/evl_prod/etc/hadoop/ classpath)
Step2:
updated spark-defaults.conf for spark.yarn.jars to update path for jar spark.
spark.yarn.jars=local:/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/lib/spark/jars/*,local:/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/lib/spark/hive/*,local:/app/bds/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/lib/spark/lib/*
Step3: Ran the pyspark from cloud Gateway Node, it throws below error in stderr of containers on On prem Resource manager Ui jogs logs.
Log Type: stderr
Log Upload Time: Fri Aug 30 06:46:04 -0400 2019
Log Length: 1082
Picked up JAVA_TOOL_OPTIONS: -Doracle.jdbc.thinLogonCapability=o3 -Djava.security.krb5.conf=/etc/krb5_bds.conf 19/08/30 06:46:03 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT] Unknown/unsupported param List(--dist-cache-conf, /app/bds/data/yarn/nm/01/usercache/t617351/appcache/application_1567160664350_0006/container_e22_1567160664350_0006_02_000001/__spark_conf__/__spark_dist_cache__.properties) Usage: org.apache.spark.deploy.yarn.ApplicationMaster [options] Options: --jar JAR_PATH Path to your application's JAR file --class CLASS_NAME Name of your application's main class --primary-py-file A main Python file --primary-r-file A main R file --py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place on the PYTHONPATH for Python apps. --args ARGS Arguments to be passed to your application's main class. Multiple invocations are possible, each will be passed in order. --properties-file FILE Path to a custom Spark properties file.
Log Type: stdout
Log Upload Time: Fri Aug 30 06:46:04 -0400 2019
Log Length: 0
Created 08-30-2019 05:12 AM
Hello @VijayM
This error is due to the two clusters having different Spark major versions.
CDH5 is Spark 1.x, CDH6 is Spark 2.x - there are major differences between the two and code written in one may not run in the other.
To resolve, you should ensure the Spark versions on the cluster match.
Created 09-01-2019 02:44 PM
Created 09-01-2019 05:50 PM