Created 02-19-2016 07:30 AM
Is creating indexes on hive table recommended?
http://www.slideshare.net/ye.mikez/hive-tuning?next_slideshow=1
It sort of suggests that creating indexing should be avoided. Just want some thought from the community on this.
Created 02-19-2016 09:33 AM
The short answer is no. Indexes in Hive are not recommended.
The reason for this is ORC. ORC has build in Indexes which allow the format to skip blocks of data during read, they also support Bloom filters. Together this pretty much replicates what Hive Indexes did and they do it automatically in the data format without the need to manage an external table ( which is essentially what happens in indexes. ). I would rather spend my time to properly setup the ORC tables.
Again shameless plug:
http://www.slideshare.net/BenjaminLeonhardi/hive-loading-data
Created 02-19-2016 08:03 AM
@Shivaji, Have you checked below links, it had given information about when to avoid using indexing in hive:
https://acadgild.com/blog/indexing-in-hive/
-
Another link which has given some useful information about Indexing in Hive:
or search
index-based join operations in hive - CiteSeer
Hope it help you get required information to decide whether to use Indexes in Hive or not?
Created 02-23-2016 05:30 AM
@shivaji, If the original question is answered then please accept the best answer.
Created 02-19-2016 09:33 AM
The short answer is no. Indexes in Hive are not recommended.
The reason for this is ORC. ORC has build in Indexes which allow the format to skip blocks of data during read, they also support Bloom filters. Together this pretty much replicates what Hive Indexes did and they do it automatically in the data format without the need to manage an external table ( which is essentially what happens in indexes. ). I would rather spend my time to properly setup the ORC tables.
Again shameless plug:
http://www.slideshare.net/BenjaminLeonhardi/hive-loading-data
Created 02-23-2016 05:38 AM
@Shivaji I agree with Benjamin. Hive indexes is not recommended.
Created 03-25-2017 06:18 AM
@Benjamin Leonhardi , on slide 24 you notate that a small stripe size indicates a memory problem during load. Do you know what memory problem that would be? I have ~ 3500 records on the stripe and was just wondering where I should look. Thanks!