Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Querying hive 3 transaction enabled table which is being streamed to

Explorer

I have a hive 3 transaction enabled table which is being streamed into by NiFi putHive3Streaming processor. 
When I am trying to query (select count(*)) the table it is failing with 

org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: <HDFS path to manged table>/<partition>/delta<...>/bucket_0000*

 

I could also observe compaction jobs being run for the table which I believe is the reason for the delta file being deleted but cannot understand why the query is failing.

 

can we not query a table for which a compaction job is being run ?

 

 

I am on HDP-3.1.0.0 running hive on tez

 

2 REPLIES 2

That is correct.  You should have the compaction events scheduled for an appropriate time when the maintenance will not interrupt queries.

 

Here is a community search for compaction which will net many educational results:

 

https://community.cloudera.com/t5/forums/searchpage/tab/message?advanced=false&allow_punctuation=fal...

Explorer

What should be done in a scenario where there is no possibility of a maintenance window ?

 

Its is quite contradictory to what the documentation has to say


"All compactions are done in the background and do not prevent concurrent reads and writes of the data.  After a compaction the system waits until all readers of the old files have finished and then removes the old files."

 

https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Compactor