Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Sqoop --split-by on a string /varchar column

Sqoop --split-by on a string /varchar column

Contributor

Can a non-numeric column be specified for a --split-by key parameter? What are the potential issues in doing so?

4 REPLIES 4
Highlighted

Re: Sqoop --split-by on a string /varchar column

No, it must be numeric because according to the specs: "By default sqoop will use query select min(<split-by>), max(<split-by>) from <table name> to find out boundaries for creating splits." The alternative is to use --boundary-query which also requires numeric columns. Otherwise the Sqoop job will fail. If you don't have such a column in your table the only workaround is to use only 1 mapper: "-m 1".

Re: Sqoop --split-by on a string /varchar column

Contributor

The answer is outdated. It is possible to use a character attribute as split-by attribute.

You only need to add -Dorg.apache.sqoop.splitter.allow_text_splitter=true

after your 'sqoop job' statement like this:

sqoop job -Dorg.apache.sqoop.splitter.allow_text_splitter=true \\
    --create ${JOB_NAME} \\
    -- \\
    import \\
    --connect \"${JDBC}\" \\
    --username ${SOURCE_USR} \\
    --password-file ${PWD_FILE_PATH} \\

no guarantees though that sqoop splits your records evenly over your mappers though.

Re: Sqoop --split-by on a string /varchar column

Explorer

For huge number of row the above options will cause duplicates in the results set.

Re: Sqoop --split-by on a string /varchar column

Explorer

Thank you @Krish E, did you sort it out now? I am having the same issue. What is your table's size?

Don't have an account?
Coming from Hortonworks? Activate your account here