Hello, I have a transactional ACID table that is receiving data from a Spark streaming program. I would like to copy data out of this table into a non-ACID source but my results are inconsistent. Many times, my query will work and I am able to populate the target table, but sometimes I'll run into issues where the map reduce jobs don't launch while waiting for locks. The logs state 'Heart Beat' and then the job times out. What options are there to ensure consistent results?
set hive.support.concurrency=true; set hive.execution.engine=mr; set hive.enforce.bucketing=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.compactor.initiator.on=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; set hive.fetch.task.conversion=none; set hive.vectorized.execution.enabled=false; CREATE TABLE acid_enabled_table (foo timestamp) CLUSTERED BY(foo) INTO 3 BUCKETS STORED AS ORC TBLPROPERTIES ('transactional'='true'); --My query to copy data from ACID table to non-Acid table Insert into non_acid_table xyz select foo from acid_enabled_table where cond = 'abc';
Have you tried Disabling the ACID for that particular session and then try to run perform the query:-
1- Login to beeline or Hive CLI
2- Disable ACID for the session:-
3- Run the Query now to copy the data.