Support Questions
Find answers, ask questions, and share your expertise

HDFS Write exception running Storm JUnit test

Contributor

I am trying to run a Storm unit test from within my Eclipse environment that writes to HDFS in my Sandbox VM. The graph runs fine but at the end I am getting the following exception:

File /tmp/cdr_storm/hdfsWriter-2-0-1462816860314.txt could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.

This makes sense as there is only one data node in the Sandbox. I don't see a minReplication setting in HDFS or in the HdfsBolt. I am guessing the HdfsBolt client is trying to write the block to 3 total locations but as there is only 1, it runs into the above error. Is this a setting in the client or in the HDFS service and how can I override this to work in my (limited) development environment?

3 REPLIES 3

Mentor

You can set replication factor on the sandbox globally. I once spoke to Sandbox team to enable it by default. There is no reason to have RF set to 3.

Contributor

Thanks Artem. I tried tuning down the replication factor in the Sandbox HDFS but didn't make a difference. I believe the issue I am facing here is related to the HCC question posted here, however I've opened the necessary port on my Sandbox VM and now I'm trying to figure out how I can pass the referenced configuration "dfs.client.use.datanode.hostname" into the HdfsBolt.

It means that your hdfs-client couldn't connect to your datanode with 50010 port. As you connected to hdfs namenode, you could got a datanode's status. But, your hdfs-client would failed to connect to your datanode.

(In hdfs, a namenode manages file directories, and datanodes. If hdfs-client connect to a namnenode, it will find a target file path and address of datanode that have the data. Then hdfs-client will communicate with datanode. (You can check those datanode uri by using netstat. because, hdfs-client will be trying to communicate with datanodes using by address informed by namenode)

I solved that problem by:

  1. opening 50010 port in a firewall.
  2. adding propertiy "dfs.client.use.datanode.hostname", "true"
  3. adding hostname to hostfile in my client PC.

I'm sorry for my poor English skill.