Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Creating Indexes in Hive

avatar

Is creating indexes on hive table recommended?

http://www.slideshare.net/ye.mikez/hive-tuning?next_slideshow=1

It sort of suggests that creating indexing should be avoided. Just want some thought from the community on this.

1 ACCEPTED SOLUTION

avatar
Master Guru

The short answer is no. Indexes in Hive are not recommended.

The reason for this is ORC. ORC has build in Indexes which allow the format to skip blocks of data during read, they also support Bloom filters. Together this pretty much replicates what Hive Indexes did and they do it automatically in the data format without the need to manage an external table ( which is essentially what happens in indexes. ). I would rather spend my time to properly setup the ORC tables.

Again shameless plug:

http://www.slideshare.net/BenjaminLeonhardi/hive-loading-data

View solution in original post

5 REPLIES 5

avatar

@Shivaji, Have you checked below links, it had given information about when to avoid using indexing in hive:

https://acadgild.com/blog/indexing-in-hive/

-

Another link which has given some useful information about Indexing in Hive:

http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=5E97F9C310D5978ED19CF9F0E96D2407?doi=10.1.1...

or search

index-based join operations in hive - CiteSeer

-

Hope it help you get required information to decide whether to use Indexes in Hive or not?

avatar

@shivaji, If the original question is answered then please accept the best answer.

avatar
Master Guru

The short answer is no. Indexes in Hive are not recommended.

The reason for this is ORC. ORC has build in Indexes which allow the format to skip blocks of data during read, they also support Bloom filters. Together this pretty much replicates what Hive Indexes did and they do it automatically in the data format without the need to manage an external table ( which is essentially what happens in indexes. ). I would rather spend my time to properly setup the ORC tables.

Again shameless plug:

http://www.slideshare.net/BenjaminLeonhardi/hive-loading-data

avatar
Master Mentor

@Shivaji I agree with Benjamin. Hive indexes is not recommended.

avatar
Rising Star

@Benjamin Leonhardi , on slide 24 you notate that a small stripe size indicates a memory problem during load. Do you know what memory problem that would be? I have ~ 3500 records on the stripe and was just wondering where I should look. Thanks!