Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Bloom filter maintenance or updates?

Solved Go to solution

Bloom filter maintenance or updates?

New Contributor

I have an ORC table that I am updating daily with the contents of a CSV file. When I created the table, I specified a bloom filter column. Is there any maintenance I need to perform with subsequent inserts? The table is about 500MM records and gets 50MM new records daily.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Bloom filter maintenance or updates?

New Contributor

Just basic inserts. This is great to learn - thanks for the quick reply!

3 REPLIES 3

Re: Bloom filter maintenance or updates?

Hive Transactions? Or a normal insert? Insert doesn't change anything since a new ORC file will be created and all ORC files will have their own bloom filter index. I am pretty sure the same is true for ACID tables as well since the compactor effectively creates a new ORC file.

Re: Bloom filter maintenance or updates?

New Contributor

Just basic inserts. This is great to learn - thanks for the quick reply!

Re: Bloom filter maintenance or updates?

yeah if you want to see it in action look into the HDFS folder before the insert and after ( you should see a couple new files like 00000_1 ... in there ) . These are the newly added rows in the new output files from your insert job. You can look at the bloom filter indexes with hive --orcfiledump -rowindex ... <filename>

http://www.slideshare.net/BenjaminLeonhardi/hive-loading-data