12-13-2017 12:04 AM
We have 2 hadoop clusters.
Diff version, in the same domain.
Cluster A have many master data that Cluster B needs to refer to.
Total data size is nearly 100TB(without replication).
Our network speed is less than 1GB, so it takes time to transfer big data.
Now I want to merge data from 2 clusters on Hue.
My plan is to set external table of Cluster B on Cluster A.
When external table is created on Cluster A, is whole data copied to Cluster A?
Or just meta data of Cluster B is created on Cluster A?
Since my data is huge, it’s critical point of this plan.
I searched through docs and it seems like only meta data is created, but can not be sure.
Can anyone confirm?