Reply
Highlighted
Explorer
Posts: 14
Registered: ‎02-08-2019

constraints on a hive table after sqooping

[ Edited ]

Hello, 

 

 I would like to know that when I am sqooping an oracle table with primary key and importing it to hive, the constraints like primary key remain at the hive table ? I am using a command like the following: 

sqoop import --connect ... --table MY_TABLE --hive-import  --create-hive-table ....

 

Thans in advance.

Posts: 1,894
Kudos: 433
Solutions: 303
Registered: ‎07-31-2013

Re: constraints on a hive table after sqooping

Two things here:

First, yes, Sqoop will copy only what data comes through the connection and the query, and will not duplicate data as part of its import process. The divided tasks are each fully re-done with no partial results kept around if there is a failure/retry/speculative execution during its run.

However, keep in mind that Hive has no such constraints in its own architecture (no concept of a primary key). So after your import, it is upto your use of the table and the updates you make to it to maintain that 'effect'.

You can consider using Kudu + Impala instead of Hive if the notion of primary key(s) is important to your use, although Sqoop doesn't offer a way to directly import data to it (You'll need to insert out of the Hive table into the Kudu one via Impala, after the Sqoop import to Hive is done).
Explorer
Posts: 14
Registered: ‎02-08-2019

Re: constraints on a hive table after sqooping

@Harsh J  In the link of the official documentation of Hive QL below HIVE QL  

PRIMARY and FOREIGN constraints have been added. When you do sqoop importing to HIVE the new table is manipulated by HIVE component. What am I wrong?

hive.JPG

 

 

 

Posts: 1,894
Kudos: 433
Solutions: 303
Registered: ‎07-31-2013

Re: constraints on a hive table after sqooping

Recently upstream hive versions do support that keyword at table creation
but do not enforce it on the data end. They leave it up to you to maintain
that, so I wouldn't rely on its use with sqoop.

The feature currently exists as a way to explicitly specify such a data
relationship (externally maintained of course, also mentioned on the quoted
HQL doc link) to the query optimizer and planner so it can make better
decisions. See https://issues.apache.org/jira/browse/HIVE-13076 for more
details on the design and scope.

I'd recommend using Kudu instead for it's ease of use in this area (it
enforces primary keys).
Announcements
New solutions