Support Questions
Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

constraints on a hive table after sqooping




 I would like to know that when I am sqooping an oracle table with primary key and importing it to hive, the constraints like primary key remain at the hive table ? I am using a command like the following: 

sqoop import --connect ... --table MY_TABLE --hive-import  --create-hive-table ....


Thans in advance.


Master Guru
Two things here:

First, yes, Sqoop will copy only what data comes through the connection and the query, and will not duplicate data as part of its import process. The divided tasks are each fully re-done with no partial results kept around if there is a failure/retry/speculative execution during its run.

However, keep in mind that Hive has no such constraints in its own architecture (no concept of a primary key). So after your import, it is upto your use of the table and the updates you make to it to maintain that 'effect'.

You can consider using Kudu + Impala instead of Hive if the notion of primary key(s) is important to your use, although Sqoop doesn't offer a way to directly import data to it (You'll need to insert out of the Hive table into the Kudu one via Impala, after the Sqoop import to Hive is done).


@Harsh J  In the link of the official documentation of Hive QL below HIVE QL  

PRIMARY and FOREIGN constraints have been added. When you do sqoop importing to HIVE the new table is manipulated by HIVE component. What am I wrong?





Master Guru
Recently upstream hive versions do support that keyword at table creation
but do not enforce it on the data end. They leave it up to you to maintain
that, so I wouldn't rely on its use with sqoop.

The feature currently exists as a way to explicitly specify such a data
relationship (externally maintained of course, also mentioned on the quoted
HQL doc link) to the query optimizer and planner so it can make better
decisions. See for more
details on the design and scope.

I'd recommend using Kudu instead for it's ease of use in this area (it
enforces primary keys).