Hi, each day we will get 10-20 GB of binary files.
We need to upload these files into HDFS. Also we want to limit access to cluster from client side (side which delivers 10-20GB files)
What are the best approaches?
We have several ideas:
1. SFTP on our side (for example one of our data-nodes) and then hadoop fs -put
2. hadoop fs -put from client side (who delivers data). But we would like to forbid direct remote access to cluster.
3. WebHDFS (is it working???) the problem is the same, we don't want give access to cluster or its interface to the client.
*And we don't want to establish kerberos or stuff like that, we have private secure network for the cluster.