Created 01-03-2020 03:36 AM
I have a hive 3 transaction enabled table which is being streamed into by NiFi putHive3Streaming processor.
When I am trying to query (select count(*)) the table it is failing with
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: <HDFS path to manged table>/<partition>/delta<...>/bucket_0000*
I could also observe compaction jobs being run for the table which I believe is the reason for the delta file being deleted but cannot understand why the query is failing.
can we not query a table for which a compaction job is being run ?
I am on HDP-3.1.0.0 running hive on tez
Created 01-03-2020 06:35 AM
That is correct. You should have the compaction events scheduled for an appropriate time when the maintenance will not interrupt queries.
Here is a community search for compaction which will net many educational results:
Created 01-09-2020 02:21 AM
What should be done in a scenario where there is no possibility of a maintenance window ?
Its is quite contradictory to what the documentation has to say
"All compactions are done in the background and do not prevent concurrent reads and writes of the data. After a compaction the system waits until all readers of the old files have finished and then removes the old files."
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Compactor