Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Sqoop Import - while importing the data, the splits are not performed based on default mappers(4).

Highlighted

Sqoop Import - while importing the data, the splits are not performed based on default mappers(4).

Rising Star

Below is my scenario.

In a databases - below are the three records.

10572-db.png

While i am trying to import this table, the boundary values are 2 to 6 (id is the primary key), however, the splits are happening based on the boundary value (2 to 6)([2,3,4,5,6] = 5 mappers) and the splits are happening as 5 mappers.

Wondering, how Sqoop is splitting it into 5 mappers as by default it should be 4 mappers. I am not specifying split size in the Sqoop statement,which means it should be defaulted to 4 mappers. below is my sqoop statement

$ sqoop import --connect jdbc:mysq://localhost/test --username test -p --table employee

2 REPLIES 2

Re: Sqoop Import - while importing the data, the splits are not performed based on default mappers(4).

Rising Star

@Praveen PentaReddy

If your table has primary key by default sqoop will identify it and split by its key column.

Sqoop needs a criterion by which it can split the workload. Sqoop uses a splitting column to split the workload. By default, Sqoop will identify the primary key column (if present) in a table and use it as the splitting column. The low and high values for the splitting column are retrieved from the database, and the map tasks operate on evenly-sized components of the total range

Source: https://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html

Highlighted

Re: Sqoop Import - while importing the data, the splits are not performed based on default mappers(4).

Rising Star

Thanks for your response.

Yes, In this case,Primary key column exists and it is picking-up the primary key(id), however, it should be default it with 4 mappers as i am not passing the number of mappers, rather it is using 5 mappers, for just 3 records.

Whenever, we don't specify the number of mappers, Sqoop usually uses 4 mappers.- In this case, this is not happening.

My Question - Why it is running with 5 splits/Mappers?

I have tested it for large volume boundary of primary key( from 1(min) to 117(max) values) without providing the number of mappers and it is still using 4 mappers.

Don't have an account?
Coming from Hortonworks? Activate your account here