Created 06-13-2021 10:42 PM
Hi All,
I am trying to achieve incremental sqoop import into hive when a new record is created as well when updates happen to the existing records.
We have tables having both created date column and modified date column. But the modified date column will be updated only when there is any update in the existing records and will remain null when a new record is inserted. Only the created date will be updated when new records are created.
So, when we try the incremental lastmodified option it always looks for the modified date column but our source tables doesnt have values in the modified date column always as it keeps NULL for new records and only updates the value when an existing table gets modified.
So, for this kind of source DB tables with such columns is there any solution in sqoop import.
Also, sqoop incremental with lastmodifed doesnt work with hive tables and we have to create a table already and then run this incremental lastmodified
So, please help on which approach will suit better to do an incremental import of our data into Hive tables.
Need a solution badly for this scenario. please help if you have done this before.
Created 06-14-2021 10:12 PM
Hi @sass,
To use sqoop, you will need to combine the two columns, your CREATED_ON column and UPDATED_ON column. Unfortunately, sqoop is only able to look at one timestamp column as the control for incremental load.
As for the error you are getting, that is expected due to SQOOP-2986. The alternative strategy is to do a regular sqoop import (without hive-import) and provide --target-dir to scoop that coincides with the Hive table location.
Hope this helps,
Alex
Created 06-14-2021 10:12 PM
Hi @sass,
To use sqoop, you will need to combine the two columns, your CREATED_ON column and UPDATED_ON column. Unfortunately, sqoop is only able to look at one timestamp column as the control for incremental load.
As for the error you are getting, that is expected due to SQOOP-2986. The alternative strategy is to do a regular sqoop import (without hive-import) and provide --target-dir to scoop that coincides with the Hive table location.
Hope this helps,
Alex
Created 06-15-2021 02:24 AM
@sass Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
Regards,
Vidya Sargur,Created 06-16-2021 10:26 PM
Yes, I was doing both append and incremental last modified by giving warehouse directory which did not help in merging the records. But when I gave target directory it worked fine. Now, new records get updated on the hive table with incremental append script and updated records are entered into hive using lastmodified with updated_date but with target directory as parameter and without giving hive-import parameter. Thanks for suggesting it. That's what already worked,
Created on 02-01-2022 12:05 AM - edited 02-01-2022 12:05 AM