Created on 09-26-2018 02:44 AM - edited 09-16-2022 06:45 AM
In sqoop import how mapreduce works in key & value pair in rdbms tables with structure data?
Please explain.
Created 09-27-2018 02:20 AM
Created 09-27-2018 02:49 AM
Can you point to any code in cloudera in sqoop how it determine split by range for each mapper is determined using split-by column?
Created 10-07-2018 08:48 PM
In rdbms database block size is 8kb and in hadoop block size is 64MB. In sqoop import example my rdbms tables size is 300mb. So it will split into 5 mapper ? Please confirm
Created 10-22-2018 05:49 PM
I think the default block size is 128 MB. But anyway this is not the factor that determine number of mapper for sqoop.
number of mapper depend on --num-mappers parameter you specify in sqoop import and you also need to mention the --split-by <column-name>. Based on column name you provided sqoop will find the min and max value and divide it by --num-mappers. Is best to use primary key as the split-by column or any column which has high cardinality to ensure your mappers are balanced.