Support Questions

briandoran · ‎09-22-2015

Hello,

I have an 7 node cluster (6 datanodes) and I am executing HDFSClient Put from an application outside the cloudera cluster. Internally the

cluster is configured to use an internal IP (172.x.x.x range)

I get the error below when I issue a hdfs put to the name node ... note the IP returned is172.123.123.123:50010 which is the internal ip address

and not accessible from the application host.

HDFS

2015-09-22 19:26:48.292+01:00 INFO [Thread-11] org.apache.hadoop.hdfs.DFSClient - Exception in createBlockOutputStream
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/172.123.123.123:50010]
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534) ~[hadoop-common-2.7.0.jar!/:na]
at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1610) ~[hadoop-hdfs-2.6.0.jar!/:na]
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408) [hadoop-hdfs-2.6.0.jar!/:na]
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361) [hadoop-hdfs-2.6.0.jar!/:na]
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588) [hadoop-hdfs-2.6.0.jar!/:na]
2015-09-22 19:26:48.292+01:00 INFO [Thread-11] org.apache.hadoop.hdfs.DFSClient - Abandoning BP-383231650-172.16.1.45-1435792324508:blk_1074408707_667883
2015-09-22 19:26:48.316+01:00 INFO [Thread-11] org.apache.hadoop.hdfs.DFSClient - Excluding datanode 172.123.123.123:50010

Wildcard addresses is being used on datanode/namenode

Also, I've tried enabling the following parameter to no avail:

dfs.datanode.use.datanode.hostname
dfs.client.use.datanode.hostname

Is there any way to make the hostname be returned here instead of that IP?

Any pointers appreciated..

Brian

Harsh J · ‎10-01-2015

Glad to hear! Please consider marking this thread as resolved so others with similar problems may find a solution quicker.

View solution in original post

ddekany · ‎08-11-2018

Since it wasn't really described how exactly did you resolve it... The point is that on the client side (it's important that it's not on the server side), set "dfs.datanode.use.datanode.hostname" in the org.apache.hadoop.conf.Configuration object to value "true". If the Configuration object isn't created by your code (like if Spark creates it, in my case), then it depends on what creates it... see its documentation. But some guesses:

Attempt 1: Set it inside $HADOOP_HOME/etc/hadoop/hdfs-site.xml. Hadoop command line tools use that, your Java application though... maybe not.

Attempt 2: Put $HADOOP_HOME/etc/hadoop/ into the Java classpath (or pack hdfs-site.xml into your project under /src/main/resources/, but that's kind of dirty...). This works with Spark.

Spark only: SparkSession.builder().config("spark.hadoop.dfs.client.use.datanode.hostname", "true").[...]

Of course, you may also need to add the domain name of the DataNode-s (as the NameNode knows it) into the /etc/hosts on the computer running your application.

Cloudera Community

Support Questions

HDFS put failing due to internal IP address use