Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

best option to run a query on hadoop cluster A and import that data in hadoop cluster B

Expert Contributor

Hi,

We have 2 different hadoop clusters(cloudera & hdp) , i have to run an import query on Impala twce a day to import that data into our hadoop cluster. What is the best way to do this? I see sqoop only imports from relational db, may be distcp but what if i want to do a query with where conditions?

Is there any other way? if i decide to get all data from partition using distcp anyways, wghat ports should be open on both clusters?

Thanks in advance.

4 REPLIES 4

@PJ

You can try this!

insert overwrite directory "hdfs://<hdpClusterNameNodeHost>:<NNPort>/<yourDirStructure>/" <your query here>

Let know if that works for you.

Expert Contributor
@Rahul Soni

thanks for the reply but i get this error:

hive> insert overwrite directory "hdfs://nn:8020/test/" select * from table;

FAILED: SemanticException Error creating temporary folder on: hdfs://nn:8020/test

Sweet! Probably the user doesn't have permission to write on that cluster than 🙂

Can you please confirm?

Expert Contributor
@Rahul Soni

I am using same exact user for both clusters, this user has all permissions.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.