Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Bloom filter maintenance or updates?

Explorer

I have an ORC table that I am updating daily with the contents of a CSV file. When I created the table, I specified a bloom filter column. Is there any maintenance I need to perform with subsequent inserts? The table is about 500MM records and gets 50MM new records daily.

1 ACCEPTED SOLUTION

Explorer

Just basic inserts. This is great to learn - thanks for the quick reply!

View solution in original post

3 REPLIES 3

Hive Transactions? Or a normal insert? Insert doesn't change anything since a new ORC file will be created and all ORC files will have their own bloom filter index. I am pretty sure the same is true for ACID tables as well since the compactor effectively creates a new ORC file.

Explorer

Just basic inserts. This is great to learn - thanks for the quick reply!

yeah if you want to see it in action look into the HDFS folder before the insert and after ( you should see a couple new files like 00000_1 ... in there ) . These are the newly added rows in the new output files from your insert job. You can look at the bloom filter indexes with hive --orcfiledump -rowindex ... <filename>

http://www.slideshare.net/BenjaminLeonhardi/hive-loading-data

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.