Member since
08-26-2016
4
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1502 | 03-29-2017 07:46 AM |
03-29-2017
07:46 AM
Hello all, Thanks @ssingla and @Umair Khan for your answers. Finally, I found a solution consisting of deriving the `FileInputFormat` class and overriding the `getSplits` method in order to get only the splits corresponding to the wanted part of the HDFS file. In this method, I call the super class to get the splits generated by the `InputFileFormat` class. Thanks to the configuration of the job, I manage to get some information like the beginning of the HDFS file and the end of the HDFS file I wanted to read. Finally, the beginning and the end of all splits get from the `getSPlits` method of the super class are compared to the previous information and returned if they match the the wanted part of the HDFS file.
... View more
03-27-2017
02:08 PM
Hello everybody, I have a big file in HDFS (~20Gb) on which I usually execute a MapReduce job. Around 170 mappers are created. The InputFormat used is a FileInputFormat. Now I would like to execute the MapReduce job only on a part of the file (for example, the first 40Mb of the file). Is there a simple way to perform this? Thanks for your help.
... View more
Labels:
- Labels:
-
Apache Hadoop