Reply
New Contributor
Posts: 3
Registered: ‎05-16-2014

hive.hadoop.supports.splittable.combineinputformat and non-indexed lzo compressed files

Has anyone been using hive.hadoop.supports.splittable.combineinputformat with Hive .10 (CDH4.3.2)  and non-indexed lzo (non-splittable) compressed files?

 

We recently tried out this parameter with non-indexed lzo files and at first it appeared to be great, reducing the number of mappers to read the input data and doing a great job of merging data into files that were about the size of the max input split size.

 

Unfortunately, it was found that results from some hive queries with this parameter in place and using non-indexed lzo file started to give different results.

 

Here are a couple of older Jira's that seemed to indicate we could use this parameter with non-splittable compressed files.

 

https://issues.apache.org/jira/browse/MAPREDUCE-1597

https://issues.apache.org/jira/browse/HIVE-2089

 

Perhaps this is a new bug or we did something wrong.

 

 

New Contributor
Posts: 3
Registered: ‎05-16-2014

Re: hive.hadoop.supports.splittable.combineinputformat and non-indexed lzo compressed files

We found this Jira: https://issues.apache.org/jira/browse/MAPREDUCE-5537

 

Which was to fix a bug that seems similar to what we saw, howver, that was back in Hive .8.

 

Perhaps it made it back into Hive .10?