Created 04-11-2016 02:50 PM
Can a non-numeric column be specified for a --split-by key parameter? What are the potential issues in doing so?
Created 04-11-2016 03:13 PM
No, it must be numeric because according to the specs: "By default sqoop will use query select min(<split-by>), max(<split-by>) from <table name> to find out boundaries for creating splits." The alternative is to use --boundary-query which also requires numeric columns. Otherwise the Sqoop job will fail. If you don't have such a column in your table the only workaround is to use only 1 mapper: "-m 1".
Created 11-03-2016 01:10 PM
The answer is outdated. It is possible to use a character attribute as split-by attribute.
You only need to add -Dorg.apache.sqoop.splitter.allow_text_splitter=true
after your 'sqoop job' statement like this:
sqoop job -Dorg.apache.sqoop.splitter.allow_text_splitter=true \\ --create ${JOB_NAME} \\ -- \\ import \\ --connect \"${JDBC}\" \\ --username ${SOURCE_USR} \\ --password-file ${PWD_FILE_PATH} \\
no guarantees though that sqoop splits your records evenly over your mappers though.
Created 07-09-2018 01:29 AM
For huge number of row the above options will cause duplicates in the results set.
Created 07-11-2018 02:23 PM
Thank you @Krish E, did you sort it out now? I am having the same issue. What is your table's size?