New Contributor
Posts: 5
Registered: ‎12-12-2017

Merging data from 2 hadoop clusters



We have 2 hadoop clusters.

Diff version, in the same domain.

One is high spec with higher version, the another is low spac, lower version.

Lower version clusters have many master data that higher version cluster need to refer to.


Data size is nearly 100TB(without replication).

Our network speed is less than 1GB, so it takes time to transfer data.


How can we merge data from 2 clusters on Hue,hive,Impala or Spark?

Any ideas?


Thanks in advance.

New Contributor
Posts: 5
Registered: ‎12-12-2017

Re: Merging data from 2 hadoop clusters

I've been thinkng... 


Let's narrow it down to Hue only.

There's thing called external table.

So what if I create external tables of higher version hadoop on lower version hadoop?

Then I should be able to merge data from 2 clusters.

But issue is that I don't have much disk space in lower version hadoop.


Is creating external table means to store whole data in lower version of hadoop?

Or just meta data only?