Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to properly execute spark-submit command with Yarn?

Highlighted

How to properly execute spark-submit command with Yarn?

Contributor

I should execute `spark-submit` in the Hadoop cluster created with Ambari. There are 3 instances: 1 master node and 2 executer nodes.

So, I logged in the master node as `centos` user and executed this command:

sudo -u hdfs spark-submit --master yarn --deploy-mode cluster --driver-memory 6g  --executor-memory 4g --executor-cores 2 --class org.tests.GraphProcessor graph.jar

But I got the error message that the file graph.jar does not exist. Therefore I tried to copy this file to HDFS as follows:

hdfs dfs -put graph.jar /home/hdfs/tmp

However, the error is:

No such file or directory: `hdfs://eureambarimaster1.local.eurecat.org:8020/home/hdfs/tmp'
1 REPLY 1

Re: How to properly execute spark-submit command with Yarn?

@Liana Napalkova The graph.jar will be automatically copied to hdfs and distribute by the spark client. You only need to point to the location of graph.jar in the local file system. For example:

spark-submit --master yarn --deploy-mode cluster --driver-memory 6g--executor-memory 4g--executor-cores 2--class org.tests.GraphProcessor /path/to/graph.jar

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

Don't have an account?
Coming from Hortonworks? Activate your account here