Support Questions

learninghuman · ‎04-11-2016

Can a non-numeric column be specified for a --split-by key parameter? What are the potential issues in doing so?

pminovic · ‎04-11-2016

No, it must be numeric because according to the specs: "By default sqoop will use query select min(<split-by>), max(<split-by>) from <table name> to find out boundaries for creating splits." The alternative is to use --boundary-query which also requires numeric columns. Otherwise the Sqoop job will fail. If you don't have such a column in your table the only workaround is to use only 1 mapper: "-m 1".

rene_sluiter · ‎11-03-2016

The answer is outdated. It is possible to use a character attribute as split-by attribute.

You only need to add -Dorg.apache.sqoop.splitter.allow_text_splitter=true

after your 'sqoop job' statement like this:

sqoop job -Dorg.apache.sqoop.splitter.allow_text_splitter=true \\
    --create ${JOB_NAME} \\
    -- \\
    import \\
    --connect \"${JDBC}\" \\
    --username ${SOURCE_USR} \\
    --password-file ${PWD_FILE_PATH} \\

no guarantees though that sqoop splits your records evenly over your mappers though.

elkrish · ‎07-09-2018

For huge number of row the above options will cause duplicates in the results set.

axie · ‎07-11-2018

Thank you @Krish E, did you sort it out now? I am having the same issue. What is your table's size?

Cloudera Community

Support Questions

Sqoop --split-by on a string /varchar column