- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Fetch missing ids from impala with sqoop
- Labels:
-
Apache Impala
-
Apache Sqoop
Created on
07-08-2022
06:48 AM
- last edited on
07-08-2022
02:07 PM
by
ask_bill_brooks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a table in impala and I want every day to check the source table with sqoop to see if there are any missing ids. For this purpose I have done:
- sqoop import to a staging table all the ids from the impala table
- select id from sqoop_table where id not in(select id impala_table)
- save the result to a .txt
- create a var and store the seded .txt in order to make the results from vertical to horizontal.
From this step I have issues. When I try to parse this var in sqoop to fetch only the missing ids it throws me an error that argument is list too long.
The thing is that I can not change the max capacity of vars. The average amount of ids for 2 days is 40k
Is there any other way to compare the remote table with my impala table and fetch only the missing records?
