Created on 04-28-2017 11:12 PM
Hive is having an improvised explain plan with the hive.explain.user set to true. Now the indentation and arrows clearly depict the execution flow. Retrieving the join order needed a lot of back and forth references in the old style which was made as easy as following the arrows now, even with the details of mappers and reducers.
Let us take a look at one sample explain plan and retrieve the join order
Plan optimized by CBO. Vertex dependency in root stage Map 2 <- Map 1 (BROADCAST_EDGE), Map 4 (BROADCAST_EDGE), Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE), Map 7 (BROADCAST_EDGE), Map 8 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE) Reducer 3 <- Map 2 (SIMPLE_EDGE) Stage-0 Fetch Operator limit:-1 Stage-1 Reducer 3 vectorized, llap File Output Operator [FS_101] Group By Operator [GBY_100] (rows=1 width=736) Output:["_col0","_col1","_col2","_col3"],aggregations:["min(VALUE._col0)","min(VALUE._col1)","min(VALUE._col2)","min(VALUE._col3)"] <-Map 2 [SIMPLE_EDGE] llap SHUFFLE [RS_45] Group By Operator [GBY_44] (rows=1 width=736) Output:["_col0","_col1","_col2","_col3"],aggregations:["min(_col1)","min(_col3)","min(_col14)","min(_col19)"] Map Join Operator [MAPJOIN_92] (rows=80573328660391072 width=114) Conds:MAPJOIN_91._col11, _col7=RS_41._col0, _col0(Inner),Output:["_col1","_col3","_col14","_col19"] <-Map 9 [BROADCAST_EDGE] llap BROADCAST [RS_41] PartitionCols:_col0, _col0 Select Operator [SEL_23] (rows=1960358 width=104) Output:["_col0","_col1"] Filter Operator [FIL_86] (rows=1960358 width=104) predicate:id is not null TableScan [TS_21] (rows=1960358 width=104) default@title,t,Tbl:COMPLETE,Col:NONE,Output:["id","title"] <-Map Join Operator [MAPJOIN_91] (rows=73248479012737488 width=125) Conds:MAPJOIN_90._col10=RS_38._col0(Inner),Output:["_col1","_col3","_col7","_col11","_col14"] <-Map 8 [BROADCAST_EDGE] llap BROADCAST [RS_38] PartitionCols:_col0 Select Operator [SEL_20] (rows=2 width=40) Output:["_col0"] Filter Operator [FIL_85] (rows=2 width=40) predicate:((UDFToString(role) = 'actress') and id is not null) TableScan [TS_18] (rows=4 width=40) default@role_type,rt,Tbl:COMPLETE,Col:NONE,Output:["id","role"] <-Map Join Operator [MAPJOIN_90] (rows=66589524931926816 width=138) Conds:MAPJOIN_89._col0, _col6=RS_35._col0, _col0(Inner),Output:["_col1","_col3","_col7","_col10","_col11","_col14"] <-Map 7 [BROADCAST_EDGE] llap BROADCAST [RS_35] PartitionCols:_col0, _col0 Select Operator [SEL_17] (rows=764741 width=105) Output:["_col0","_col1"] Filter Operator [FIL_84] (rows=764741 width=105) predicate:((UDFToString(gender) = 'f') and (name like '%An%') and id is not null) TableScan [TS_15] (rows=3058966 width=105) default@name,n,Tbl:COMPLETE,Col:NONE,Output:["id","name","gender"] <-Map Join Operator [MAPJOIN_89] (rows=60535930444216152 width=152) Conds:MAPJOIN_88._col7, _col4=RS_32._col0, _col1(Inner),Output:["_col0","_col1","_col3","_col6","_col7","_col10","_col11"] <-Map 6 [BROADCAST_EDGE] llap BROADCAST [RS_32] PartitionCols:_col0, _col1 Select Operator [SEL_14] (rows=11636372 width=8) Output:["_col0","_col1"] Filter Operator [FIL_83] (rows=11636372 width=8) predicate:(movie_id is not null and company_id is not null) TableScan [TS_12] (rows=11636372 width=8) default@movie_companies,mc,Tbl:COMPLETE,Col:NONE,Output:["movie_id","company_id"] <-Map Join Operator [MAPJOIN_88] (rows=55032662847395656 width=167) Conds:MAPJOIN_87._col0, _col2=RS_29._col0, _col2(Inner),Output:["_col0","_col1","_col3","_col4","_col6","_col7","_col10"] <-Map 4 [BROADCAST_EDGE] llap BROADCAST [RS_29] PartitionCols:_col0, _col2 Select Operator [SEL_8] (rows=6110483 width=116) Output:["_col0","_col1","_col2","_col4"] Filter Operator [FIL_81] (rows=6110483 width=116) predicate:((note) IN ('(voice)', '(voice: Japanese version)', '(voice) (uncredited)', '(voice: English version)') and person_id is not null and person_role_id is not null and movie_id is not null and role_id is not null) TableScan [TS_6] (rows=12220966 width=116) default@cast_info,ci,Tbl:COMPLETE,Col:NONE,Output:["person_id","movie_id","person_role_id","note","role_id"] <-Map Join Operator [MAPJOIN_87] (rows=50029692413268030 width=184) Conds:(Inner),(Inner),Output:["_col0","_col1","_col2","_col3","_col4"] <-Map 1 [BROADCAST_EDGE] llap BROADCAST [RS_24] Select Operator [SEL_2] (rows=701938 width=104) Output:["_col0","_col1"] Filter Operator [FIL_79] (rows=701938 width=104) predicate:person_id is not null TableScan [TS_0] (rows=701938 width=104) default@aka_name,an,Tbl:COMPLETE,Col:NONE,Output:["person_id","name"] <-Map 5 [BROADCAST_EDGE] llap BROADCAST [RS_26] Select Operator [SEL_11] (rows=34365 width=259) Output:["_col0"] Filter Operator [FIL_82] (rows=34365 width=259) predicate:((UDFToString(country_code) = '[us]') and id is not null) TableScan [TS_9] (rows=68731 width=259) default@company_name,cn,Tbl:COMPLETE,Col:NONE,Output:["id","country_code"] <-Select Operator [SEL_5] (rows=2074019 width=104) Output:["_col0","_col1"] Filter Operator [FIL_80] (rows=2074019 width=104) predicate:id is not null TableScan [TS_3] (rows=2074019 width=104) default@char_name,chn,Tbl:COMPLETE,Col:NONE,Output:["id","name"]
1. Start reading from the bottom
2. Look for the keyword 'TableScan', the row below it will have the details of the tables being joined
3. Look for the keyword 'Join Operator', the indent of this line will tell us the tables that are part of this join
4. This joined table group will form candidate for another join if the line starts with '<-'
5. Similarly keep going to top to get the complete join order
The join order in the above example boils down to (title(t),(((movie_companies(mc),(cast_info(ci),(company_name(cn),aka_name(an),char_name(chn)))),name(n)),role_type(rt)))