Created on 09-17-2018 02:10 AM - edited 09-16-2022 06:43 AM
Hi, I have a requirement which sounds similar to symlinking in Hadoop/HDFS.
Requirement:
There are 2 production clusters: Cluster 1 and Cluster 2
I want to read data of cluster 1 from cluster 2 without copying it.
What came to my mind is, can I use hadoop fs -ls hdfs://namespace1/user/xyz on cluster 2?
I understand that cluster 2 won't know what is namespace1 - but thought of putting/appending namespace ID related info in hdfs-site.xml of cluster 2. (via advanced snippet - gateway configs)
Is this possible?
Any other alternative? hftp? (never tried both)
Thanks
Siddesh
Created 11-09-2018 02:26 AM
Created 09-17-2018 05:27 AM
Created 09-17-2018 07:20 AM
I have mentioned that, as I am reading data from Cluster 1, I am using hdfs://nameservice1:/user/abc on cluster 2.
nameservice1 is related to namenodes of cluster 1, so what is the issue?
Thanks
Siddesh
Created 09-17-2018 07:39 AM
I was replying to the idea of symlink.
If you just want to access data from Cluster1 on Cluster2 (or anywhere else), make sure your hdfs config files for the client points to the Cluster1. I think it is hdfs-site.xml
Created on 09-17-2018 11:49 PM - edited 09-17-2018 11:51 PM
I suggest, you can create two linux user account for cluster1 and cluster2 respectively and configure .bashrc.
For example:
Created 10-14-2018 09:01 AM
Created 11-09-2018 02:26 AM