In sqoop import how mapreduce works in key & value pair in rdbms tables with structure data?
Can you point to any code in cloudera in sqoop how it determine split by range for each mapper is determined using split-by column?
In rdbms database block size is 8kb and in hadoop block size is 64MB. In sqoop import example my rdbms tables size is 300mb. So it will split into 5 mapper ? Please confirm
I think the default block size is 128 MB. But anyway this is not the factor that determine number of mapper for sqoop.
number of mapper depend on --num-mappers parameter you specify in sqoop import and you also need to mention the --split-by <column-name>. Based on column name you provided sqoop will find the min and max value and divide it by --num-mappers. Is best to use primary key as the split-by column or any column which has high cardinality to ensure your mappers are balanced.