Support Questions

Find answers, ask questions, and share your expertise

Fetch missing ids from impala with sqoop

avatar
Contributor

I have a table in impala and I want every day to check the source table with sqoop to see if there are any missing ids. For this purpose I have done:

  1. sqoop import to a staging table all the ids from the impala table
  2. select id from sqoop_table where id not in(select id impala_table)
  3. save the result to a .txt
  4. create a var and store the seded .txt in order to make the results from vertical to horizontal.

From this step I have issues. When I try to parse this var in sqoop to fetch only the missing ids it throws me an error that argument is list too long.

The thing is that I can not change the max capacity of vars. The average amount of ids for 2 days is 40k

Is there any other way to compare the remote table with my impala table and fetch only the missing records?

0 REPLIES 0