Support Questions
Find answers, ask questions, and share your expertise

Querying hive 3 transaction enabled table which is being streamed to

Querying hive 3 transaction enabled table which is being streamed to

Explorer

I have a hive 3 transaction enabled table which is being streamed into by NiFi putHive3Streaming processor. 
When I am trying to query (select count(*)) the table it is failing with 

org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: <HDFS path to manged table>/<partition>/delta<...>/bucket_0000*

 

I could also observe compaction jobs being run for the table which I believe is the reason for the delta file being deleted but cannot understand why the query is failing.

 

can we not query a table for which a compaction job is being run ?

 

 

I am on HDP-3.1.0.0 running hive on tez

 

2 REPLIES 2
Highlighted

Re: Querying hive 3 transaction enabled table which is being streamed to

That is correct.  You should have the compaction events scheduled for an appropriate time when the maintenance will not interrupt queries.

 

Here is a community search for compaction which will net many educational results:

 

https://community.cloudera.com/t5/forums/searchpage/tab/message?advanced=false&allow_punctuation=fal...

Highlighted

Re: Querying hive 3 transaction enabled table which is being streamed to

Explorer

What should be done in a scenario where there is no possibility of a maintenance window ?

 

Its is quite contradictory to what the documentation has to say


"All compactions are done in the background and do not prevent concurrent reads and writes of the data.  After a compaction the system waits until all readers of the old files have finished and then removes the old files."

 

https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Compactor