Created 01-16-2019 04:57 PM
Issue: Loading data into Hive ORC table is infinite, I should manually kill the load process.
I am trying to load data into ORC Hive table from another Hive TEXTFILE table. Since the source files are TXT/Json, loading data first into TEXT table and then trying to load into ORC table.
Cluster: HDP 2.6.5-292
Hive version: 1.2.1000.2.6.5.0-292
Here is the Hive TEXTFILE table schema:
Create external table if not exists TEXTTable(ID bigint, DOCUMENT_ID bigint, NUM varchar(20), SUBMITTER_ID bigint, FILING string, CODE varchar(10), RECEIPTNUM varchar(20))
row format delimited fields terminated by '|'
Location '/data/3rdPartyData/Hive/ TEXTTable '
TBLPROPERTIES ('skip.header.line.count'='1');
Load Data into TEXTFILE table:
load data local inpath '/data/TextFile.txt' overwrite into table TEXTTable;
Here is the Hive ORC table schema:
Create external table if not exists ORCTable(ID bigint, DOCUMENT_ID bigint, NUM varchar(20), SUBMITTER_ID bigint, FILING TIMESTAMP, CODE varchar(10), RECEIPTNUM varchar(20))
row format delimited fields terminated by '|'
STORED as ORC
Location '/data/3rdPartyData/Hive/ ORCTable '
TBLPROPERTIES ('orc.compress'='SNAPPY');
Load data into ORC table:
Insert overwrite table ORCTable select _ID, DOCUMENT_ID, NUM, SUBMITTER_ID,from_unixtime(unix_timestamp(FILING, "yyyy-MM-dd'T'HH:mm:ss")) as FILING, CODE, RECEIPTNUM from TEXTTable;
Created 01-16-2019 09:07 PM
Since it is weird behavior, I have rechecked all the memory usage and configuration and noticed that it was due to TEZ memory set to Yarn Max memory. After reducing the TEZ memory, this is fixed.
Created 01-16-2019 05:42 PM
I have noticed that it is not only ORC table, it is also the same for normal table. This is happening to load data from other table whereas loading data from source file into table is fine.
Created 01-16-2019 09:07 PM
Since it is weird behavior, I have rechecked all the memory usage and configuration and noticed that it was due to TEZ memory set to Yarn Max memory. After reducing the TEZ memory, this is fixed.