I have a table in impala and I want every day to check the source table with sqoop to see if there are any missing ids. For this purpose I have done:
- sqoop import to a staging table all the ids from the impala table
- select id from sqoop_table where id not in(select id impala_table)
- save the result to a .txt
- create a var and store the seded .txt in order to make the results from vertical to horizontal.
From this step I have issues. When I try to parse this var in sqoop to fetch only the missing ids it throws me an error that argument is list too long.
The thing is that I can not change the max capacity of vars. The average amount of ids for 2 days is 40k
Is there any other way to compare the remote table with my impala table and fetch only the missing records?