Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

constraints on a hive table after sqooping

constraints on a hive table after sqooping

Contributor

Hello, 

 

 I would like to know that when I am sqooping an oracle table with primary key and importing it to hive, the constraints like primary key remain at the hive table ? I am using a command like the following: 

sqoop import --connect ... --table MY_TABLE --hive-import  --create-hive-table ....

 

Thans in advance.

3 REPLIES 3

Re: constraints on a hive table after sqooping

Master Guru
Two things here:

First, yes, Sqoop will copy only what data comes through the connection and the query, and will not duplicate data as part of its import process. The divided tasks are each fully re-done with no partial results kept around if there is a failure/retry/speculative execution during its run.

However, keep in mind that Hive has no such constraints in its own architecture (no concept of a primary key). So after your import, it is upto your use of the table and the updates you make to it to maintain that 'effect'.

You can consider using Kudu + Impala instead of Hive if the notion of primary key(s) is important to your use, although Sqoop doesn't offer a way to directly import data to it (You'll need to insert out of the Hive table into the Kudu one via Impala, after the Sqoop import to Hive is done).
Highlighted

Re: constraints on a hive table after sqooping

Contributor

@Harsh J  In the link of the official documentation of Hive QL below HIVE QL  

PRIMARY and FOREIGN constraints have been added. When you do sqoop importing to HIVE the new table is manipulated by HIVE component. What am I wrong?

hive.JPG

 

 

 

Re: constraints on a hive table after sqooping

Master Guru
Recently upstream hive versions do support that keyword at table creation
but do not enforce it on the data end. They leave it up to you to maintain
that, so I wouldn't rely on its use with sqoop.

The feature currently exists as a way to explicitly specify such a data
relationship (externally maintained of course, also mentioned on the quoted
HQL doc link) to the query optimizer and planner so it can make better
decisions. See https://issues.apache.org/jira/browse/HIVE-13076 for more
details on the design and scope.

I'd recommend using Kudu instead for it's ease of use in this area (it
enforces primary keys).
Don't have an account?
Coming from Hortonworks? Activate your account here