Reply
New Contributor
Posts: 4
Registered: ‎08-04-2017

In sqoop import how mapreduce works

[ Edited ]

In sqoop import how mapreduce works in key & value pair in rdbms tables with structure data?

Please explain.

Posts: 1,748
Kudos: 364
Solutions: 277
Registered: ‎07-31-2013

Re: In sqoop import how mapreduce works

Apache Sqoop is open source, so you can checkout what it does underneath when curiosity strikes.

Consider an import scenario with text output format.

Here's the mapper used for this: https://github.com/cloudera/sqoop/blob/cdh5.15.0-release/src/java/org/apache/sqoop/mapreduce/TextImp...
- Note the K, V input types are LongWritable and 'SqoopRecord'

The data is supplied to a mapper by its InputFormat, or more specifically, its RecordReader. Sqoop reads from DB using JDBC, and its implemented as a RecordReader by this class: https://github.com/cloudera/sqoop/blob/cdh5.15.0-release/src/java/org/apache/sqoop/mapreduce/db/DBRe...

Effectively, for a given query boundary (boundaries decided based on some key's range and number of mappers requested at submit time), a JDBC connection reads each record and passes them as values into the map function which then writes them out into some desired format. The key in the map task is just a local record counter that is wholly ignored as an input.
New Contributor
Posts: 4
Registered: ‎07-06-2018

Re: In sqoop import how mapreduce works

Can you point to any code in cloudera in sqoop how it determine split by range for each mapper is determined using split-by column?

New Contributor
Posts: 4
Registered: ‎08-04-2017

Re: In sqoop import how mapreduce works

In rdbms database block size is 8kb and in hadoop block size is 64MB. In sqoop import example my rdbms tables size is 300mb. So it will split into 5 mapper ? Please confirm

Announcements