I have a hive 3 transaction enabled table which is being streamed into by NiFi putHive3Streaming processor.
When I am trying to query (select count(*)) the table it is failing with
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: <HDFS path to manged table>/<partition>/delta<...>/bucket_0000*
I could also observe compaction jobs being run for the table which I believe is the reason for the delta file being deleted but cannot understand why the query is failing.
can we not query a table for which a compaction job is being run ?
I am on HDP-188.8.131.52 running hive on tez
That is correct. You should have the compaction events scheduled for an appropriate time when the maintenance will not interrupt queries.
Here is a community search for compaction which will net many educational results:
What should be done in a scenario where there is no possibility of a maintenance window ?
Its is quite contradictory to what the documentation has to say
"All compactions are done in the background and do not prevent concurrent reads and writes of the data. After a compaction the system waits until all readers of the old files have finished and then removes the old files."