Created 04-16-2018 04:04 PM
We have 2 different hadoop clusters(cloudera & hdp) , i have to run an import query on Impala twce a day to import that data into our hadoop cluster. What is the best way to do this? I see sqoop only imports from relational db, may be distcp but what if i want to do a query with where conditions?
Is there any other way? if i decide to get all data from partition using distcp anyways, wghat ports should be open on both clusters?
Thanks in advance.
Created 04-16-2018 05:53 PM
You can try this!
insert overwrite directory "hdfs://<hdpClusterNameNodeHost>:<NNPort>/<yourDirStructure>/" <your query here>
Let know if that works for you.
Created 04-16-2018 06:44 PM
thanks for the reply but i get this error:
hive> insert overwrite directory "hdfs://nn:8020/test/" select * from table;
FAILED: SemanticException Error creating temporary folder on: hdfs://nn:8020/test
Created 04-16-2018 07:56 PM
Sweet! Probably the user doesn't have permission to write on that cluster than 🙂
Can you please confirm?
Created 04-16-2018 09:56 PM