- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Creating Indexes in Hive
- Labels:
-
Apache Hive
Created ‎02-19-2016 07:30 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is creating indexes on hive table recommended?
http://www.slideshare.net/ye.mikez/hive-tuning?next_slideshow=1
It sort of suggests that creating indexing should be avoided. Just want some thought from the community on this.
Created ‎02-19-2016 09:33 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The short answer is no. Indexes in Hive are not recommended.
The reason for this is ORC. ORC has build in Indexes which allow the format to skip blocks of data during read, they also support Bloom filters. Together this pretty much replicates what Hive Indexes did and they do it automatically in the data format without the need to manage an external table ( which is essentially what happens in indexes. ). I would rather spend my time to properly setup the ORC tables.
Again shameless plug:
http://www.slideshare.net/BenjaminLeonhardi/hive-loading-data
Created ‎02-19-2016 08:03 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Shivaji, Have you checked below links, it had given information about when to avoid using indexing in hive:
https://acadgild.com/blog/indexing-in-hive/
-
Another link which has given some useful information about Indexing in Hive:
or search
index-based join operations in hive - CiteSeer
-
Hope it help you get required information to decide whether to use Indexes in Hive or not?
Created ‎02-23-2016 05:30 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@shivaji, If the original question is answered then please accept the best answer.
Created ‎02-19-2016 09:33 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The short answer is no. Indexes in Hive are not recommended.
The reason for this is ORC. ORC has build in Indexes which allow the format to skip blocks of data during read, they also support Bloom filters. Together this pretty much replicates what Hive Indexes did and they do it automatically in the data format without the need to manage an external table ( which is essentially what happens in indexes. ). I would rather spend my time to properly setup the ORC tables.
Again shameless plug:
http://www.slideshare.net/BenjaminLeonhardi/hive-loading-data
Created ‎02-23-2016 05:38 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Shivaji I agree with Benjamin. Hive indexes is not recommended.
Created ‎03-25-2017 06:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Benjamin Leonhardi , on slide 24 you notate that a small stripe size indicates a memory problem during load. Do you know what memory problem that would be? I have ~ 3500 records on the stripe and was just wondering where I should look. Thanks!
