Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Sqoop ignoring number of mappers

Sqoop ignoring number of mappers

Contributor

Many times when I run a sqoop command it ignores the number of mappers i tell it to use.

Common examples are:

  1. --num-mappers 2 and it uses 4 instead
  2. --num-mappers 8 and it uses 10 instead

Also, sqoop sometimes creates files that have no data in them.

Any ideas about these issues?

6 REPLIES 6

Re: Sqoop ignoring number of mappers

Rising Star

@Josh Persinger

Are you using --direct flag by any chance ? Can you post your sqoop command ?

Re: Sqoop ignoring number of mappers

New Contributor

If we use the number of mapper more than 1, is that means those many connections will be established on the source side?

Highlighted

Re: Sqoop ignoring number of mappers

New Contributor

Will there be any change in the processing time if we increase the mapperes while using --direct mode in export. can you explain, please.

Re: Sqoop ignoring number of mappers

Rising Star

@Josh Persinger

Can you try the --query option in sqoop?

Example :

sqoop import --driver <your driver name> --connect <CONNECTION STRING>/DATABASE=<DB NAME> --query "select * from <TABLE NAME> where \$CONDITIONS" --fields-terminated-by  ","  --hive-table <TABLE NAME>  --split-by <SPLIT COLUMN>  --target-dir '<SOME TMP DIRECTORY>'
--hive-import -m <NUMBER OF MAPPERS>

Re: Sqoop ignoring number of mappers

Contributor

@Josh Persinger

The -m or --num-mappers is just a hint to the engine to maintain that degree of parallelism. But its not mandatory to launch those number of tasks always. The mappers count may vary based on you input data. Sqoop client serializes the data, generates the deserializer and sets the inputformat and submits the job to be run. Maybe the inputformat controls the number of mappers like it happens in the normal text file processing. This also answers your second question where some mappers launched may not find the start() of the data in the split and will not be run.

Re: Sqoop ignoring number of mappers

Contributor

@Josh Persinger,

If we specify -m [1 or n], then it's always launch the number of map tasks which we specified with -m option.

If we didn't specify any thing like -m 1 then it will launch by default 4 mapper tasks