Sqoop --split-by on a string /varchar column


Can a non-numeric column be specified for a --split-by key parameter? What are the potential issues in doing so?


No, it must be numeric because according to the specs: "By default sqoop will use query select min(<split-by>), max(<split-by>) from <table name> to find out boundaries for creating splits." The alternative is to use --boundary-query which also requires numeric columns. Otherwise the Sqoop job will fail. If you don't have such a column in your table the only workaround is to use only 1 mapper: "-m 1".


The answer is outdated. It is possible to use a character attribute as split-by attribute.

You only need to add -Dorg.apache.sqoop.splitter.allow_text_splitter=true

after your 'sqoop job' statement like this:

sqoop job -Dorg.apache.sqoop.splitter.allow_text_splitter=true \\
    --create ${JOB_NAME} \\
    -- \\
    import \\
    --connect \"${JDBC}\" \\
    --username ${SOURCE_USR} \\
    --password-file ${PWD_FILE_PATH} \\

no guarantees though that sqoop splits your records evenly over your mappers though.


For huge number of row the above options will cause duplicates in the results set.


Thank you @Krish E, did you sort it out now? I am having the same issue. What is your table's size?

