All,
I am working on importing 1.2 billion rows from one of the db2 table, There is composite primary index in the table with at least 5 columns, hence I need to manually specify the --split-by column ( since SQOOP does not support multi-part split-by column), I tried to run the Sqoop import with one of the columns from the index which is numeric , and import ran for almost 8 hrs.
I am being suggested to try with different choice for split by key, but now I have question how to choose column for split by
1) is it supposed to be numeric? or varchar is fine too?
2) can it have nulls
3) I suppose distribution of values for column should be as even as possible, but will it matter how many maps I am choosing (--num-map ## ) ?
or any other criteria to pay attention to?
thanks
Abhijeet