Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Remote spark-submit HDFS error

avatar
New Member

I am trying to launch a spark job from a remote host to the HDP sandbox running on my Mac but keep getting the following error:

spark-assembly-1.6.2-hadoop2.6.0.jar could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.

I did this by copying the /etc/hadoop/conf directory to the host where I am launching my job via spark-submit and setting the HADOOP_CONF_DIR environment variable to that directory. Has anyone encountered this problem?

1 ACCEPTED SOLUTION

avatar
New Member

Turns out I needed to add to 50010 to the port forward list of the VM and add the following to my hdfs-site.xml:

<property>
	<name>dfs.client.use.datanode.hostname</name>
	<value>true</value>
</property>

View solution in original post

4 REPLIES 4

avatar
Super Guru
@Antonio Ye

Can you please share your spark-submit command. You do have SPARK_HOME set from you are launching the job, right?

avatar
New Member

I have tried a couple:

> spark-submit --class com.myorg.streaming.MyClass --master yarn-cluster target/scala-2.10/SparkStreamingPOC-assembly-1.0.jar

and

> spark-shell --master yarn

avatar

@Antonio Ye

Can you check if you have enough storage in your sandbox ?

hdfs dfsadmin -report 

avatar
New Member

Turns out I needed to add to 50010 to the port forward list of the VM and add the following to my hdfs-site.xml:

<property>
	<name>dfs.client.use.datanode.hostname</name>
	<value>true</value>
</property>