Support Questions
Find answers, ask questions, and share your expertise

How to combine Hive table files for input to mapreduce?

Solved Go to solution

How to combine Hive table files for input to mapreduce?

I have a HDP 2.0 cluster where I'm executing a mapreduce program which takes Hive(0.14) table as input. There are a large number of small files for the Hive table and hence large number of mapper containers are being requested. Please let me know if there is a way to combine small files before being input to mapreduce job?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: How to combine Hive table files for input to mapreduce?

@Phoncy Joseph

You can set the input record size in hive to a higher value to reduce the number of mappers but you might need to increase the mapper heap size also.

set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;

set mapred.min.split.size=100000000;

Or

Try using hadoop har file achieve to small file into single file.

https://hadoop.apache.org/docs/r1.2.1/hadoop_archives.html#Looking+Up+Files

View solution in original post

3 REPLIES 3
Highlighted

Re: How to combine Hive table files for input to mapreduce?

Are you using HiveInputFormat, its better to use CombineInputFormat which combine all small files to generate a split.

Highlighted

Re: How to combine Hive table files for input to mapreduce?

@Phoncy Joseph

You can set the input record size in hive to a higher value to reduce the number of mappers but you might need to increase the mapper heap size also.

set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;

set mapred.min.split.size=100000000;

Or

Try using hadoop har file achieve to small file into single file.

https://hadoop.apache.org/docs/r1.2.1/hadoop_archives.html#Looking+Up+Files

View solution in original post

Highlighted

Re: How to combine Hive table files for input to mapreduce?

New Contributor

Are there any counters that can assess this? I am trying the above properties but failing to see reduction in the number of mappers.