Created 06-27-2016 10:11 PM
I have an ORC table that I am updating daily with the contents of a CSV file. When I created the table, I specified a bloom filter column. Is there any maintenance I need to perform with subsequent inserts? The table is about 500MM records and gets 50MM new records daily.
Created 06-27-2016 10:30 PM
Just basic inserts. This is great to learn - thanks for the quick reply!
Created 06-27-2016 10:27 PM
Hive Transactions? Or a normal insert? Insert doesn't change anything since a new ORC file will be created and all ORC files will have their own bloom filter index. I am pretty sure the same is true for ACID tables as well since the compactor effectively creates a new ORC file.
Created 06-27-2016 10:30 PM
Just basic inserts. This is great to learn - thanks for the quick reply!
Created 06-27-2016 10:33 PM
yeah if you want to see it in action look into the HDFS folder before the insert and after ( you should see a couple new files like 00000_1 ... in there ) . These are the newly added rows in the new output files from your insert job. You can look at the bloom filter indexes with hive --orcfiledump -rowindex ... <filename>
http://www.slideshare.net/BenjaminLeonhardi/hive-loading-data