Created 08-16-2016 03:39 AM
I am trying to launch a spark job from a remote host to the HDP sandbox running on my Mac but keep getting the following error:
spark-assembly-1.6.2-hadoop2.6.0.jar could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
I did this by copying the /etc/hadoop/conf directory to the host where I am launching my job via spark-submit and setting the HADOOP_CONF_DIR environment variable to that directory. Has anyone encountered this problem?
Created 08-17-2016 09:05 PM
Turns out I needed to add to 50010 to the port forward list of the VM and add the following to my hdfs-site.xml:
<property> <name>dfs.client.use.datanode.hostname</name> <value>true</value> </property>
Created 08-16-2016 04:07 AM
Can you please share your spark-submit command. You do have SPARK_HOME set from you are launching the job, right?
Created 08-16-2016 04:29 AM
I have tried a couple:
> spark-submit --class com.myorg.streaming.MyClass --master yarn-cluster target/scala-2.10/SparkStreamingPOC-assembly-1.0.jar
and
> spark-shell --master yarn
Created 08-16-2016 07:59 AM
Created 08-17-2016 09:05 PM
Turns out I needed to add to 50010 to the port forward list of the VM and add the following to my hdfs-site.xml:
<property> <name>dfs.client.use.datanode.hostname</name> <value>true</value> </property>