Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Execute MapReduce job only on a part of a HDFS file

avatar

Hello everybody,

I have a big file in HDFS (~20Gb) on which I usually execute a MapReduce job. Around 170 mappers are created. The InputFormat used is a FileInputFormat.

Now I would like to execute the MapReduce job only on a part of the file (for example, the first 40Mb of the file).

Is there a simple way to perform this?

Thanks for your help.

1 ACCEPTED SOLUTION

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
3 REPLIES 3

avatar
Expert Contributor

Try something like:

hadoop fs -cat /path_to_hdfs_file/test.csv | head -c 40000000

avatar
Rising Star

I would recommend to split up the file and then the MR job of yours on each of the file.

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login