Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What is the best way to use Sqoop to import tables into Hive when these have no primary key?

What is the best way to use Sqoop to import tables into Hive when these have no primary key?

New Contributor

Hi! I'm new to Hadoop, I've just started to learn about its ecosystem and all the tools it has.

 

Currently, I'm writing a batch script to migrate a source database into Hive. I want it to copy as much data as possible, and that includes tables that lack a primary key (like n-to-n relations). I don't mind if I have to create a new table with its own primary key in the process.

 

What would be the best procedure to do so? In case Sqoop and Hive are not the best tools for such a job, should I consider something else? I'll be grateful of any advice I can get.

2 REPLIES 2

Re: What is the best way to use Sqoop to import tables into Hive when these have no primary key?

Expert Contributor

Hey There,

 

I'm assuming you're using Sqoop 1 since Sqoop 2 does not yet support importing into hive. The primary key is used mainly to control import/export distribution. You can specify a column to partition by via the --split-by argument.

 

-Abe

Re: What is the best way to use Sqoop to import tables into Hive when these have no primary key?

New Contributor

Hi Abe,

 

Sorry for taking so long to reply. Indeed, I am using Sqoop 1. After you told me the purpose of the primary key in the import job, I realized Sqoop wasn't the tool I needed... or at least as it is. The idea of my project is to absorb as much data as possible from the source without user intervention. Therefore, it shouldn't use a manual --split-by.

 

I made a workaround for this by writing a job in Talend/bash that automatically finds what's the best column to split each table by, then runs a Sqoop import job for each table splitting by said column.