We have 2 different hadoop clusters(cloudera & hdp) , i have to run an import query on Impala twce a day to import that data into our hadoop cluster. What is the best way to do this? I see sqoop only imports from relational db, may be distcp but what if i want to do a query with where conditions?
Is there any other way? if i decide to get all data from partition using distcp anyways, wghat ports should be open on both clusters?
Thanks in advance.
thanks for the reply but i get this error:
hive> insert overwrite directory "hdfs://nn:8020/test/" select * from table;
FAILED: SemanticException Error creating temporary folder on: hdfs://nn:8020/test
Sweet! Probably the user doesn't have permission to write on that cluster than :)
Can you please confirm?