Support Questions

Find answers, ask questions, and share your expertise

Remote spark-submit HDFS error

avatar
New Contributor

I am trying to launch a spark job from a remote host to the HDP sandbox running on my Mac but keep getting the following error:

spark-assembly-1.6.2-hadoop2.6.0.jar could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.

I did this by copying the /etc/hadoop/conf directory to the host where I am launching my job via spark-submit and setting the HADOOP_CONF_DIR environment variable to that directory. Has anyone encountered this problem?

1 ACCEPTED SOLUTION

avatar
New Contributor

Turns out I needed to add to 50010 to the port forward list of the VM and add the following to my hdfs-site.xml:

<property>
	<name>dfs.client.use.datanode.hostname</name>
	<value>true</value>
</property>

View solution in original post

4 REPLIES 4

avatar
Super Guru
@Antonio Ye

Can you please share your spark-submit command. You do have SPARK_HOME set from you are launching the job, right?

avatar
New Contributor

I have tried a couple:

> spark-submit --class com.myorg.streaming.MyClass --master yarn-cluster target/scala-2.10/SparkStreamingPOC-assembly-1.0.jar

and

> spark-shell --master yarn

avatar

@Antonio Ye

Can you check if you have enough storage in your sandbox ?

hdfs dfsadmin -report 

avatar
New Contributor

Turns out I needed to add to 50010 to the port forward list of the VM and add the following to my hdfs-site.xml:

<property>
	<name>dfs.client.use.datanode.hostname</name>
	<value>true</value>
</property>