Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to write data on HDFS in a different network?

Highlighted

How to write data on HDFS in a different network?

New Contributor

I am trying to write files stored on a machine in Network A to an HDFS cluster in network B using Java API. The components of HDFS cluster are running on Docker containers in distributed mode. Inside the cluster, the nodes are assigned internal IPs (something like 17.20.x.y) and each node has a hostname of the form 'node-agent-x'.

The only IP exposed is that of the namenode. The client program makes RPC to namenode for creating the file. The namenode then assigns datanodes to the client to which it can send the file contents in the form of blocks. However, while assigning, namenode specifies the hostnames/internal IPs (both of which are unknown and unreachable from the client) of the datanodes.

As expected, the client is unable to resolve the IP address/hostname.

What should I do/implement so that my client program can send data to the datanodes, without exposing the datanodes?

Is it possible that I can use a proxy/edge nodes?