Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Merging data from 2 hadoop clusters


Merging data from 2 hadoop clusters

New Contributor



We have 2 hadoop clusters.

Diff version, in the same domain.

One is high spec with higher version, the another is low spac, lower version.

Lower version clusters have many master data that higher version cluster need to refer to.


Data size is nearly 100TB(without replication).

Our network speed is less than 1GB, so it takes time to transfer data.


How can we merge data from 2 clusters on Hue,hive,Impala or Spark?

Any ideas?


Thanks in advance.


Re: Merging data from 2 hadoop clusters

New Contributor

I've been thinkng... 


Let's narrow it down to Hue only.

There's thing called external table.

So what if I create external tables of higher version hadoop on lower version hadoop?

Then I should be able to merge data from 2 clusters.

But issue is that I don't have much disk space in lower version hadoop.


Is creating external table means to store whole data in lower version of hadoop?

Or just meta data only?




Don't have an account?
Coming from Hortonworks? Activate your account here