Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Updating hive table with sqoop from mysql table

avatar
Expert Contributor

Hi everyone, I have already a hive table called roles. I need to update this table with info coming up from mysql. So, I have used this script think that it will add and update new data on my hive table:`

sqoop import --connect jdbc:mysql://xxxx/retail_export --username xxxx --password xxx \ --table roles --split-by id_emp --check-column id_emp --last-value 5 --incremental append \ --target-dir /user/ingenieroandresangel/hive/roles --hive-import --hive-database poc --hive-table roles

Unfortunatelly that only insert the new data but I can't update the record that already exits. before you ask a couple of statements:

* the table doesn't have a PK

* if i dont specify --last-value as a parametter I will get duplicated records for those who already exist.

How could I figure it out without apply a truncate table or recreate the table using a PK? exist the way?

thanks guys.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

@Andres Urrego,

What you are looking for (UPSERTS) aren't available in SQOOP-import.

There are several approaches on how to actually update data in Hive. One of them is described here:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_data-access/content/incrementally-updati...

Other approaches are also using side load and merge as post-sqoop or scheduled jobs/processes.

You can also check Hive ACID transactions, or using Hive-Hbase integration package.

Choosing right approach is not trivial and depends on: initial volume, incremental volumes, frequency or incremental jobs, probability of updates, ability to identify uniqueness of records, acceptable latency, etc...

View solution in original post

1 REPLY 1

avatar
Super Collaborator

@Andres Urrego,

What you are looking for (UPSERTS) aren't available in SQOOP-import.

There are several approaches on how to actually update data in Hive. One of them is described here:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_data-access/content/incrementally-updati...

Other approaches are also using side load and merge as post-sqoop or scheduled jobs/processes.

You can also check Hive ACID transactions, or using Hive-Hbase integration package.

Choosing right approach is not trivial and depends on: initial volume, incremental volumes, frequency or incremental jobs, probability of updates, ability to identify uniqueness of records, acceptable latency, etc...