Below is my scenario.
In a databases - below are the three records.
While i am trying to import this table, the boundary values are 2 to 6 (id is the primary key), however, the splits are happening based on the boundary value (2 to 6)([2,3,4,5,6] = 5 mappers) and the splits are happening as 5 mappers.
Wondering, how Sqoop is splitting it into 5 mappers as by default it should be 4 mappers. I am not specifying split size in the Sqoop statement,which means it should be defaulted to 4 mappers. below is my sqoop statement
$ sqoop import --connect jdbc:mysq://localhost/test --username test -p --table employee
If your table has primary key by default sqoop will identify it and split by its key column.
Sqoop needs a criterion by which it can split the workload. Sqoop uses a splitting column to split the workload. By default, Sqoop will identify the primary key column (if present) in a table and use it as the splitting column. The low and high values for the splitting column are retrieved from the database, and the map tasks operate on evenly-sized components of the total range
Thanks for your response.
Yes, In this case,Primary key column exists and it is picking-up the primary key(id), however, it should be default it with 4 mappers as i am not passing the number of mappers, rather it is using 5 mappers, for just 3 records.
Whenever, we don't specify the number of mappers, Sqoop usually uses 4 mappers.- In this case, this is not happening.
My Question - Why it is running with 5 splits/Mappers?
I have tested it for large volume boundary of primary key( from 1(min) to 117(max) values) without providing the number of mappers and it is still using 4 mappers.