Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Bloom filter maintenance or updates?

avatar
New Member

I have an ORC table that I am updating daily with the contents of a CSV file. When I created the table, I specified a bloom filter column. Is there any maintenance I need to perform with subsequent inserts? The table is about 500MM records and gets 50MM new records daily.

1 ACCEPTED SOLUTION

avatar
New Member

Just basic inserts. This is great to learn - thanks for the quick reply!

View solution in original post

3 REPLIES 3

avatar
Master Guru

Hive Transactions? Or a normal insert? Insert doesn't change anything since a new ORC file will be created and all ORC files will have their own bloom filter index. I am pretty sure the same is true for ACID tables as well since the compactor effectively creates a new ORC file.

avatar
New Member

Just basic inserts. This is great to learn - thanks for the quick reply!

avatar
Master Guru

yeah if you want to see it in action look into the HDFS folder before the insert and after ( you should see a couple new files like 00000_1 ... in there ) . These are the newly added rows in the new output files from your insert job. You can look at the bloom filter indexes with hive --orcfiledump -rowindex ... <filename>

http://www.slideshare.net/BenjaminLeonhardi/hive-loading-data