How to create a hadoop custom inputformat/fileinputformat

melvinmendoza — Wed, 07 Mar 2018 15:50:47 GMT

Any knows or have a tutorial?

Re: How to create a hadoop custom inputformat/fileinputformat

kgautam — Wed, 07 Mar 2018 18:43:41 GMT

Please go through the basic understanding of InputFormats and Record Readers

http://bytepadding.com/big-data/map-reduce/how-records-are-handled-map-reduce/
http://bytepadding.com/big-data/map-reduce/how-records-are-handled-map-reduce/

Example of custom Input Formats

http://bytepadding.com/big-data/spark/combineparquetfileinputformat/

Few pointers:
1. Start with basic understanding of Splits, InputFormats, Record Readers, File formats and compression.
2. Go through the code of TextInputFormat : http://grepcode.com/file/repo1.maven.org/maven2/com.ning/metrics.collector/1.1.0-pre7/org/apache/hadoop/mapreduce/lib/input/TextInputFormat.java

3. FileInputFormat is the abstract class or the Base class for all input formats go through the basic functionality

http://grepcode.com/file/repo1.maven.org/maven2/com.ning/metrics.collector/1.1.0-pre7/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#FileInputFormat

4. Decide upon what is the logical record for your InputFormat and whats the splitting Strateggy, depending on this extend the

FileInputFormat and override/ implement getSplits() and getRecordReader() methods.

FileInputFormat important method:
getSplits() : each task will read one split, what is the start file index and end inex for this split
getRecordReader() : the split being read how bytes needs to be converted into bytes.

Re: How to create a hadoop custom inputformat/fileinputformat

melvinmendoza — Thu, 08 Mar 2018 09:04:59 GMT

thanks @kgautam this is really helpful.

question Re: How to create a hadoop custom inputformat/fileinputformat in Archives of Support Questions (Read Only)

How to create a hadoop custom inputformat/fileinputformat

Re: How to create a hadoop custom inputformat/fileinputformat

Re: How to create a hadoop custom inputformat/fileinputformat