I have an ORC table that I am updating daily with the contents of a CSV file. When I created the table, I specified a bloom filter column. Is there any maintenance I need to perform with subsequent inserts? The table is about 500MM records and gets 50MM new records daily.
Hive Transactions? Or a normal insert? Insert doesn't change anything since a new ORC file will be created and all ORC files will have their own bloom filter index. I am pretty sure the same is true for ACID tables as well since the compactor effectively creates a new ORC file.
yeah if you want to see it in action look into the HDFS folder before the insert and after ( you should see a couple new files like 00000_1 ... in there ) . These are the newly added rows in the new output files from your insert job. You can look at the bloom filter indexes with hive --orcfiledump -rowindex ... <filename>