Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Updating hive table with sqoop from mysql table

Solved Go to solution

Updating hive table with sqoop from mysql table

Rising Star

Hi everyone, I have already a hive table called roles. I need to update this table with info coming up from mysql. So, I have used this script think that it will add and update new data on my hive table:`

sqoop import --connect jdbc:mysql://xxxx/retail_export --username xxxx --password xxx \ --table roles --split-by id_emp --check-column id_emp --last-value 5 --incremental append \ --target-dir /user/ingenieroandresangel/hive/roles --hive-import --hive-database poc --hive-table roles

Unfortunatelly that only insert the new data but I can't update the record that already exits. before you ask a couple of statements:

* the table doesn't have a PK

* if i dont specify --last-value as a parametter I will get duplicated records for those who already exist.

How could I figure it out without apply a truncate table or recreate the table using a PK? exist the way?

thanks guys.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Updating hive table with sqoop from mysql table

Expert Contributor

@Andres Urrego,

What you are looking for (UPSERTS) aren't available in SQOOP-import.

There are several approaches on how to actually update data in Hive. One of them is described here:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_data-access/content/incrementally-updati...

Other approaches are also using side load and merge as post-sqoop or scheduled jobs/processes.

You can also check Hive ACID transactions, or using Hive-Hbase integration package.

Choosing right approach is not trivial and depends on: initial volume, incremental volumes, frequency or incremental jobs, probability of updates, ability to identify uniqueness of records, acceptable latency, etc...

1 REPLY 1

Re: Updating hive table with sqoop from mysql table

Expert Contributor

@Andres Urrego,

What you are looking for (UPSERTS) aren't available in SQOOP-import.

There are several approaches on how to actually update data in Hive. One of them is described here:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_data-access/content/incrementally-updati...

Other approaches are also using side load and merge as post-sqoop or scheduled jobs/processes.

You can also check Hive ACID transactions, or using Hive-Hbase integration package.

Choosing right approach is not trivial and depends on: initial volume, incremental volumes, frequency or incremental jobs, probability of updates, ability to identify uniqueness of records, acceptable latency, etc...

Don't have an account?
Coming from Hortonworks? Activate your account here