We have 2 different hadoop clusters(cloudera & hdp) , i have to run an import query on Impala twce a day to import that data into our hadoop cluster. What is the best way to do this? I see sqoop only imports from relational db, may be distcp but what if i want to do a query with where conditions?
Is there any other way? if i decide to get all data from partition using distcp anyways, wghat ports should be open on both clusters?
Thanks in advance.