Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Bloom filter maintenance or updates?

avatar
Explorer

I have an ORC table that I am updating daily with the contents of a CSV file. When I created the table, I specified a bloom filter column. Is there any maintenance I need to perform with subsequent inserts? The table is about 500MM records and gets 50MM new records daily.

1 ACCEPTED SOLUTION

avatar
Explorer

Just basic inserts. This is great to learn - thanks for the quick reply!

View solution in original post

3 REPLIES 3

avatar
Master Guru

Hive Transactions? Or a normal insert? Insert doesn't change anything since a new ORC file will be created and all ORC files will have their own bloom filter index. I am pretty sure the same is true for ACID tables as well since the compactor effectively creates a new ORC file.

avatar
Explorer

Just basic inserts. This is great to learn - thanks for the quick reply!

avatar
Master Guru

yeah if you want to see it in action look into the HDFS folder before the insert and after ( you should see a couple new files like 00000_1 ... in there ) . These are the newly added rows in the new output files from your insert job. You can look at the bloom filter indexes with hive --orcfiledump -rowindex ... <filename>

http://www.slideshare.net/BenjaminLeonhardi/hive-loading-data