09-17-2018 02:10 AM
Hi, I have a requirement which sounds similar to symlinking in Hadoop/HDFS.
There are 2 production clusters: Cluster 1 and Cluster 2
I want to read data of cluster 1 from cluster 2 without copying it.
What came to my mind is, can I use hadoop fs -ls hdfs://namespace1/user/xyz on cluster 2?
I understand that cluster 2 won't know what is namespace1 - but thought of putting/appending namespace ID related info in hdfs-site.xml of cluster 2. (via advanced snippet - gateway configs)
Is this possible?
Any other alternative? hftp? (never tried both)
09-17-2018 05:27 AM
09-17-2018 07:20 AM
I have mentioned that, as I am reading data from Cluster 1, I am using hdfs://nameservice1:/user/abc on cluster 2.
nameservice1 is related to namenodes of cluster 1, so what is the issue?
09-17-2018 07:39 AM
I was replying to the idea of symlink.
If you just want to access data from Cluster1 on Cluster2 (or anywhere else), make sure your hdfs config files for the client points to the Cluster1. I think it is hdfs-site.xml
09-17-2018 11:49 PM - edited 09-17-2018 11:51 PM
I suggest, you can create two linux user account for cluster1 and cluster2 respectively and configure .bashrc.
10-14-2018 09:01 AM
11-09-2018 02:26 AM