Support Questions

Find answers, ask questions, and share your expertise

SQOOP - Split By Key Manual

avatar
Contributor

All,

I am working on importing 1.2 billion rows from one of the db2 table, There is composite primary index in the table with at least 5 columns, hence I need to manually specify the --split-by column ( since SQOOP does not support multi-part split-by column), I tried to run the Sqoop import with one of the columns from the index which is numeric , and import ran for almost 8 hrs.

I am being suggested to try with different choice for split by key, but now I have question how to choose column for split by

1) is it supposed to be numeric? or varchar is fine too?

2) can it have nulls

3) I suppose distribution of values for column should be as even as possible, but will it matter how many maps I am choosing (--num-map ## ) ?

or any other criteria to pay attention to?

thanks

Abhijeet

1 ACCEPTED SOLUTION

avatar
Super Guru
@Abhijeet Rajput

Numeric is preferred which you are already doing. You don't run into case sensitivity issues (your database sorting records in case insensitive way for example). Do you have a column which is unique but not primary key? Even distribution is important because otherwise your sqoop job can be skewed. Number of mappers definitely matter if you have slots available. More mappers, more parallelism, faster job. See the following link if you haven't already:

http://stackoverflow.com/questions/37206232/sqoop-import-composite-primary-key-and-textual-primary-k...

View solution in original post

1 REPLY 1

avatar
Super Guru
@Abhijeet Rajput

Numeric is preferred which you are already doing. You don't run into case sensitivity issues (your database sorting records in case insensitive way for example). Do you have a column which is unique but not primary key? Even distribution is important because otherwise your sqoop job can be skewed. Number of mappers definitely matter if you have slots available. More mappers, more parallelism, faster job. See the following link if you haven't already:

http://stackoverflow.com/questions/37206232/sqoop-import-composite-primary-key-and-textual-primary-k...