I am trying to write files stored on a machine in Network A to an HDFS cluster in network B using Java API. The components of HDFS cluster are running on Docker containers in distributed mode. Inside the cluster, the nodes are assigned internal IPs (something like 17.20.x.y) and each node has a hostname of the form 'node-agent-x'. The only IP exposed is that of the namenode. The client program makes RPC to namenode for creating the file. The namenode then assigns datanodes to the client to which it can send the file contents in the form of blocks. However, while assigning, namenode specifies the hostnames/internal IPs (both of which are unknown and unreachable from the client) of the datanodes. As expected, the client is unable to resolve the IP address/hostname. What should I do/implement so that my client program can send data to the datanodes, without exposing the datanodes? Is it possible that I can use a proxy/edge nodes?
... View more