Hello, I have a transactional ACID table that is receiving data from a Spark streaming program. I would like to copy data out of this table into a non-ACID source but my results are inconsistent. Many times, my query will work and I am able to populate the target table, but sometimes I'll run into issues where the map reduce jobs don't launch while waiting for locks. The logs state 'Heart Beat' and then the job times out. What options are there to ensure consistent results?
My code:
set hive.support.concurrency=true;
set hive.execution.engine=mr;
set hive.enforce.bucketing=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.compactor.initiator.on=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set hive.fetch.task.conversion=none;
set hive.vectorized.execution.enabled=false;
CREATE TABLE acid_enabled_table
(foo timestamp)
CLUSTERED BY(foo) INTO 3 BUCKETS
STORED AS ORC TBLPROPERTIES ('transactional'='true');
--My query to copy data from ACID table to non-Acid table
Insert into non_acid_table xyz
select foo from acid_enabled_table
where cond = 'abc';