- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Incremental sqoop import into hive with tables having created date and modified date
- Labels:
-
Apache Hive
-
Apache Sqoop
Created 06-13-2021 10:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I am trying to achieve incremental sqoop import into hive when a new record is created as well when updates happen to the existing records.
We have tables having both created date column and modified date column. But the modified date column will be updated only when there is any update in the existing records and will remain null when a new record is inserted. Only the created date will be updated when new records are created.
So, when we try the incremental lastmodified option it always looks for the modified date column but our source tables doesnt have values in the modified date column always as it keeps NULL for new records and only updates the value when an existing table gets modified.
So, for this kind of source DB tables with such columns is there any solution in sqoop import.
Also, sqoop incremental with lastmodifed doesnt work with hive tables and we have to create a table already and then run this incremental lastmodified
--incremental lastmodified option for hive imports is not supported. Please remove the parameter --incremental lastmodified."
So, please help on which approach will suit better to do an incremental import of our data into Hive tables.
Need a solution badly for this scenario. please help if you have done this before.
Created 06-14-2021 10:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @sass,
To use sqoop, you will need to combine the two columns, your CREATED_ON column and UPDATED_ON column. Unfortunately, sqoop is only able to look at one timestamp column as the control for incremental load.
As for the error you are getting, that is expected due to SQOOP-2986. The alternative strategy is to do a regular sqoop import (without hive-import) and provide --target-dir to scoop that coincides with the Hive table location.
Hope this helps,
Alex
Created 06-14-2021 10:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @sass,
To use sqoop, you will need to combine the two columns, your CREATED_ON column and UPDATED_ON column. Unfortunately, sqoop is only able to look at one timestamp column as the control for incremental load.
As for the error you are getting, that is expected due to SQOOP-2986. The alternative strategy is to do a regular sqoop import (without hive-import) and provide --target-dir to scoop that coincides with the Hive table location.
Hope this helps,
Alex
Created 06-15-2021 02:24 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@sass Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
Regards,
Vidya Sargur,Community Manager
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Created 06-16-2021 10:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I was doing both append and incremental last modified by giving warehouse directory which did not help in merging the records. But when I gave target directory it worked fine. Now, new records get updated on the hive table with incremental append script and updated records are entered into hive using lastmodified with updated_date but with target directory as parameter and without giving hive-import parameter. Thanks for suggesting it. That's what already worked,
Created on 02-01-2022 12:05 AM - edited 02-01-2022 12:05 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
