Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Is it possible to insert data from an elasticsearch table via es-hadoop into a transactional hive table?

Highlighted

Is it possible to insert data from an elasticsearch table via es-hadoop into a transactional hive table?

New Contributor

I am using elasticsearch-hadoop-hive-6.3.2.jar library to run an archival of an es index that is 30 days old into a transactional hive table. We are using HDP 2.6.0, hive 1.2.1, elasticsearch 6.3.2 and cannot upgrade until we move some of our hardware to centos7. I create the transactional table using a create statement like so,

CREATE TABLE IF NOT EXISTS event (
...fields...
)
PARTITIONED BY (date string)
CLUSTERED BY (customer_id) INTO 32 BUCKETS
STORED AS ORC
LOCATION '/archive/event'
TBLPROPERTIES ('transactional'='true');

Then create an external table over an elasticsearch index,

CREATE EXTERNAL TABLE event_es (
...fields...
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES (
    'es.resource' = 'events_day_2019_06_21',
    'es.nodes' = 'es-test-node.com:9200',
    'es.read.metadata' = 'true',
    'es.mapping.date.rich' = 'false'
);

I then try to run the query to perform the archival,

INSERT INTO TABLE event PARTITION (date)
 SELECT
  ...fields...
 FROM event_es;
Query ID = hdpuser_20190717103212_5b8edc1d-8ae1-4736-a389-f750d85ea86a
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_38947589_0009)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      2          2        0        0       0       0
Reducer 2 ......   SUCCEEDED     32         32        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 9.51 s
--------------------------------------------------------------------------------
Loading data to table event partition (dt=null)
	 Time taken to load dynamic partitions: 0.105 seconds
	Loading partition {dt=2019-06-21}
	 Time taken for adding to write entity : 2
Partition event{dt=2019-06-21} stats: [numFiles=4, numRows=0, totalSize=43212, rawDataSize=0]
OK
Time taken: 12.384 seconds

Notice the numRows=0 when I run this yet numFiles and totalSize is increasing. If I recreate the event table without the transactional property (i.e. remove this clause TBLPROPERTIES ('transactional'='true')) then the insertion works. I've verified these settings in the hive cli as they are already set in our hive-site.xml

-- already set in hive-site.xml
SET hive.support.concurrency = true;
SET hive.enforce.bucketing = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
SET hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
SET hive.compactor.initiator.on = true;
SET hive.compactor.worker.threads = 1;

And tried updating the strict locking mode to false with no luck.

-- test setting strict locking mode to false (doesn't work)
SET hive.txn.strict.locking.mode=false;

Any pointers or suggestions would be greatly appreciated!