Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

best option to run a query on hadoop cluster A and import that data in hadoop cluster B

best option to run a query on hadoop cluster A and import that data in hadoop cluster B

Expert Contributor

Hi,

We have 2 different hadoop clusters(cloudera & hdp) , i have to run an import query on Impala twce a day to import that data into our hadoop cluster. What is the best way to do this? I see sqoop only imports from relational db, may be distcp but what if i want to do a query with where conditions?

Is there any other way? if i decide to get all data from partition using distcp anyways, wghat ports should be open on both clusters?

Thanks in advance.

4 REPLIES 4

Re: best option to run a query on hadoop cluster A and import that data in hadoop cluster B

@PJ

You can try this!

insert overwrite directory "hdfs://<hdpClusterNameNodeHost>:<NNPort>/<yourDirStructure>/" <your query here>

Let know if that works for you.

Re: best option to run a query on hadoop cluster A and import that data in hadoop cluster B

Expert Contributor
@Rahul Soni

thanks for the reply but i get this error:

hive> insert overwrite directory "hdfs://nn:8020/test/" select * from table;

FAILED: SemanticException Error creating temporary folder on: hdfs://nn:8020/test

Re: best option to run a query on hadoop cluster A and import that data in hadoop cluster B

Sweet! Probably the user doesn't have permission to write on that cluster than :)

Can you please confirm?

Re: best option to run a query on hadoop cluster A and import that data in hadoop cluster B

Expert Contributor
@Rahul Soni

I am using same exact user for both clusters, this user has all permissions.

Don't have an account?
Coming from Hortonworks? Activate your account here