Support Questions

nagababu · ‎06-17-2024

Hi ,

We are in the process of migrating from YARN to Kubernetes for its benefits and upgrading our Spark version from 2.4.4 to 3.5.1. As part of this transition, we have decided to use Scala version 2.12.18 and have upgraded Java from version 8 to 11. Currently, I am encountering three main issues:

I am experiencing an ArithmeticException due to long overflow. Could the switch from Java 8 to 11 be causing this issue?
The deployment mode specified as cluster in the spark-submit command is being overridden to client.
I am unable to use AWS Hadoop package classes in spark-submit, despite including the jars in the container.

$SPARK_HOME/bin/spark-submit \
--master k8s://$K8S_SERVER \ \
--deploy-mode cluster \
--name testing \
--class dt.cerebrum.iotengine.sparkjobs.streaming \
--conf spark.kubernetes.file.upload.path=s3a://cb-spark/path \
--conf spark.hadoop.fs.s3a.endpoint="http://xxxxxxx.xxx" \
--conf spark.hadoop.fs.s3a.access.key="xxxx" \
--conf spark.hadoop.fs.s3a.secret.key="xxxxxxxxx" \
--conf spark.driver.extraJavaOptions=-Divy.cache.dir=/tmp -Divy.home=/tmp \
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
--conf spark.hadoop.fs.s3a.fast.upload=true \
--conf spark.hadoop.fs.s3a.path.style.access="true" \
s3a://cb-spark/iot_engine.jar

Any assistance you could provide on these issues would be greatly appreciated.

Thank you.

RangaReddy · ‎07-09-2024

Apache Spark 3.5.1 will support Java 8/11/17 and Scala Binary Version 2.12/2.13. If you want to use Scala Binary Version 2.12 then recommended Scala version is 2.12.18

Coming to your questions:

1. With out providing the Exception stack trace details difficult to provide a solution.

2. Reason could be in your application code while creating spark session maybe you have hard coded client mode.

3. To use AWS, you need to download hadoop-aws jars files and pass it in spark submit command.

References:

1. https://spark.apache.org/docs/3.5.1/index.html

2. https://github.com/apache/spark/tree/v3.5.1

nagababu · ‎07-15-2024

Hi @RangaReddy ,

Exception stack trace:
Screenshot 2024-06-10 at 2.12.31 PM (1).png

Currently we are running our spark jobs on yarn using same code and we never get his issue. Could it be caused by lack of memory.

2. We didn't hard code the clientmode any where. I was working fine in yarn not with Kubernetes.

3. we have tried by providing the following but it didn't work. And we also downloaded these jars and placed in the jars folder. But no Luck.
--packages org.apache.hadoop:hadoop-aws:3.3.4 \
--packages com.amazonaws:aws-java-sdk-bundle:1.12.262 \
--packages org.apache.spark:spark-hadoop-cloud_2.12:3.5.1 \
--packages org.apache.hadoop:hadoop-client-api:3.3.4 \
--packages org.apache.hadoop:hadoop-client-runtime:3.3.4 \

RangaReddy · ‎07-11-2024

Hi @saifikhan

Just by providing ArithmeticException, we cant provide any solution. This can be occurred due to your code or apache spark code. Check the exception stack-trace and fix the issue if issue is from your code.

Cloudera Community

Support Questions

Spark submit configuration --deploy-mode has been overrided from cluster to client in singel node minikube

How to pass atlas-application.properties configura...

How to configure CML's Spark Connection

Spark + S3A filesystem client from HDP to access S...

Configuring Grafana for Datahub Clusters

Submit Spark jobs to Livy on CDP Public Cloud Data...

Typical HDP Cluster Network Configuration Best Pra...

How to install all clients on all nodes in your cl...

NiFi submitting Spark jobs in batch mode

Beast Mode Quotient - Part 2: Create Cloudbreak bl...

Adding nodes to an HDP cluster