Created 05-16-2019 05:53 PM
When using union all with lateral view explode, the result data is not loaded to Hive table.
This issue only happens with Tez, not MapReduce.
Not working for Tez
hive> create table t1 as
> SELECT
> key, value
> FROM (
> SELECT MAP('A', '1', 'B', '2') as event
>
> UNION ALL
>
> SELECT MAP('A', '3', 'B', '4') as event
> ) s
> LATERAL VIEW EXPLODE( s.event ) t AS key, value;
Query ID = jinyang_li_20190516053833_43773273-2986-44a3-a827-4a7be1c3c42b
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1556926264520_11548)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Map 3 .......... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 3.92 s
----------------------------------------------------------------------------------------------
OK
Time taken: 5.905 seconds
hive> select * from t1;
OK
Time taken: 1.495 seconds
Works for MR
hive> set hive.execution.engine=mr;
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> create table t2 as
> SELECT
> key, value
> FROM (
> SELECT MAP('A', '1', 'B', '2') as event
>
> UNION ALL
>
> SELECT MAP('A', '3', 'B', '4') as event
> ) s
> LATERAL VIEW EXPLODE( s.event ) t AS key, value;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = jinyang_li_20190516054003_ea895a58-5ec9-4790-aa0e-c9e1b420bbc1
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1556926264520_11549, Tracking URL = http://ip-172-21-182-163.ec2.internal:20888/proxy/application_1556926264520_11549/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1556926264520_11549
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-05-16 05:40:10,139 Stage-1 map = 0%, reduce = 0%
2019-05-16 05:40:15,411 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.36 sec
MapReduce Total cumulative CPU time: 2 seconds 360 msec
Ended Job = job_1556926264520_11549
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://ip-172-21-182-163.ec2.internal:8020/user/hive/warehouse/.hive-staging_hive_2019-05-16_05-40-03_842_1554057741424461900-1/-ext-10001
Moving data to directory hdfs://ip-172-21-182-163.ec2.internal:8020/user/hive/warehouse/t2
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.36 sec HDFS Read: 335 HDFS Write: 82 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 360 msec
OK
Time taken: 13.535 seconds
hive> select * from t2;
OK
A 1
B 2
A 3
B 4
Time taken: 0.115 seconds, Fetched: 4 row(s)
Created 05-16-2019 06:52 PM
The query plan shows no Move operator, which should be responsible for loading data to table location.
hive> explain
> create table t3 as
> select
> key, value
> from (
> SELECT MAP('A', '1', 'B', '2') as event
> UNION ALL
> SELECT MAP('A', '3', 'B', '4') as event
> ) s
> LATERAL VIEW EXPLODE( s.event ) t AS key, value;
OK
Plan not optimized by CBO.
Vertex dependency in root stage
Map 1 <- Union 2 (CONTAINS)
Map 3 <- Union 2 (CONTAINS)
Stage-3
Create Table Operator:
name:default.t3
Stage-1
Union 2
<-Map 1 [CONTAINS]
File Output Operator [FS_12]
table:{"name:":"default.t3"}
Select Operator [SEL_11] (rows=4 width=8)
Output:["_col0","_col1"]
Lateral View Join Operator [LVJ_10] (rows=4 width=810)
Output:["_col1","_col2"]
Select Operator [SEL_7] (rows=2 width=1080)
Lateral View Forward [LVF_6] (rows=2 width=540)
Select Operator [SEL_1] (rows=1 width=540)
Output:["_col0"]
TableScan [TS_0] (rows=1 width=1)
File Output Operator [FS_12]
table:{"name:":"default.t3"}
Select Operator [SEL_11] (rows=4 width=8)
Output:["_col0","_col1"]
Lateral View Join Operator [LVJ_10] (rows=4 width=810)
Output:["_col1","_col2"]
UDTF Operator [UDTF_9] (rows=2 width=540)
function name:explode
Select Operator [SEL_8] (rows=2 width=540)
Output:["_col0"]
Please refer to the previous Lateral View Forward [LVF_6]
<-Map 3 [CONTAINS]
File Output Operator [FS_12]
table:{"name:":"default.t3"}
Select Operator [SEL_11] (rows=4 width=8)
Output:["_col0","_col1"]
Lateral View Join Operator [LVJ_10] (rows=4 width=810)
Output:["_col1","_col2"]
Select Operator [SEL_7] (rows=2 width=1080)
Lateral View Forward [LVF_6] (rows=2 width=540)
Select Operator [SEL_3] (rows=1 width=540)
Output:["_col0"]
TableScan [TS_2] (rows=1 width=1)
File Output Operator [FS_12]
table:{"name:":"default.t3"}
Select Operator [SEL_11] (rows=4 width=8)
Output:["_col0","_col1"]
Lateral View Join Operator [LVJ_10] (rows=4 width=810)
Output:["_col1","_col2"]
UDTF Operator [UDTF_9] (rows=2 width=540)
function name:explode
Select Operator [SEL_8] (rows=2 width=540)
Output:["_col0"]
Please refer to the previous Lateral View Forward [LVF_6]