Created 04-03-2018 06:17 PM
I copied hdfs-site.xml and core-site.xml from hadoop master node. However, it has private IP and local file system references in it.
I guess I need to replace private FQDN to public one? Apart from this, what about local file system it is referencing to? Can I have a sample hdfs-site.xml and core-site.xml that I can use in PUTHDFS processor for remote HDFS server?
Edit: I have replaced private FQDN with public one and I get
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #000000; background-color: #ffffff} span.s1 {font-variant-ligatures: no-common-ligatures}
ERROR [StandardProcessScheduler Thread-6] o.apache.nifi.processors.hadoop.PutHDFS PutHDFS[id=487275f5-3155-3e96-6742-77d854d67d43] HDFS Configuration error - org.apache.hadoop.net.ConnectTimeoutException: 1000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=publicfqdn.compute.amazonaws.com/172.31.x.x:8020]: {}
Created 04-03-2018 08:07 PM
Can you ping that node and/or telnet to that port? It's also possible that even if you connect to the name node, that it will send back private IPs for data nodes, etc. In that case you may need to set the "dfs.datanode.use.datanode.hostname" property in hdfs-site.xml to "true" (see here for more information on that property).
Lastly, what is the version of CDH that you are using, what version of Hadoop does it run, and what version of NiFi/HDF are you using? It is possible that Apache NiFi and/or HDF NiFi are built with Hadoop dependencies incompatible with your cluster. Additionally, HDF NiFi is built with HDP dependencies, so it is possible that HDF NiFi would not be compatible with CDH.
Created 04-03-2018 06:30 PM
If you use Ambari to download the HDFS client configs, the site files you get should be correct for use in NiFi. I'm not sure where you got your site files, but they may have been server-side configs (to use private IPs/names) rather than client configs.
Created 04-03-2018 06:41 PM
@Matt Burgess: What does this error indicate? HDFS Configuration error - org.apache.hadoop.net.ConnectTimeoutException: 1000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChanne
Created 04-03-2018 06:41 PM
Did you get the client configs from Ambari, or just change your existing site files to use the public FQDN? If the latter, perhaps those ports are not exposed via the public FQDN, or perhaps are mapped to different ports for external access?
Created 04-03-2018 07:06 PM
I am using Cloudera with NIFI so I got my config files from Cloudera interface and replaced private IP to public. In core-site.xml, there is only use of 8020 port which I believe is not mapped to any other port @Matt Burgess
Created 04-03-2018 08:07 PM
Can you ping that node and/or telnet to that port? It's also possible that even if you connect to the name node, that it will send back private IPs for data nodes, etc. In that case you may need to set the "dfs.datanode.use.datanode.hostname" property in hdfs-site.xml to "true" (see here for more information on that property).
Lastly, what is the version of CDH that you are using, what version of Hadoop does it run, and what version of NiFi/HDF are you using? It is possible that Apache NiFi and/or HDF NiFi are built with Hadoop dependencies incompatible with your cluster. Additionally, HDF NiFi is built with HDP dependencies, so it is possible that HDF NiFi would not be compatible with CDH.
Created 04-04-2018 04:33 AM
I needed to whitelist port 8020 an 50071 on Hadoop cluster instance. Worked 🙂 Thank you!
Created 04-04-2018 04:33 AM
I needed to whitelist port 8020 and 50071 on Hadoop cluster instance. Worked 🙂 Thank you!