Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Rising Star

Hive is having an improvised explain plan with the hive.explain.user set to true. Now the indentation and arrows clearly depict the execution flow. Retrieving the join order needed a lot of back and forth references in the old style which was made as easy as following the arrows now, even with the details of mappers and reducers.

Let us take a look at one sample explain plan and retrieve the join order

Plan optimized by CBO.
Vertex dependency in root stage
Map 2 <- Map 1 (BROADCAST_EDGE), Map 4 (BROADCAST_EDGE), Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE), Map 7 (BROADCAST_EDGE), Map 8 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE)
Reducer 3 <- Map 2 (SIMPLE_EDGE)
Stage-0
  Fetch Operator
    limit:-1
    Stage-1
      Reducer 3 vectorized, llap
      File Output Operator [FS_101]
        Group By Operator [GBY_100] (rows=1 width=736)
          Output:["_col0","_col1","_col2","_col3"],aggregations:["min(VALUE._col0)","min(VALUE._col1)","min(VALUE._col2)","min(VALUE._col3)"]
        <-Map 2 [SIMPLE_EDGE] llap
          SHUFFLE [RS_45]
            Group By Operator [GBY_44] (rows=1 width=736)
              Output:["_col0","_col1","_col2","_col3"],aggregations:["min(_col1)","min(_col3)","min(_col14)","min(_col19)"]
              Map Join Operator [MAPJOIN_92] (rows=80573328660391072 width=114)
                Conds:MAPJOIN_91._col11, _col7=RS_41._col0, _col0(Inner),Output:["_col1","_col3","_col14","_col19"]
              <-Map 9 [BROADCAST_EDGE] llap
                BROADCAST [RS_41]
                  PartitionCols:_col0, _col0
                  Select Operator [SEL_23] (rows=1960358 width=104)
                    Output:["_col0","_col1"]
                    Filter Operator [FIL_86] (rows=1960358 width=104)
                      predicate:id is not null
                      TableScan [TS_21] (rows=1960358 width=104)
                        default@title,t,Tbl:COMPLETE,Col:NONE,Output:["id","title"]
              <-Map Join Operator [MAPJOIN_91] (rows=73248479012737488 width=125)
                  Conds:MAPJOIN_90._col10=RS_38._col0(Inner),Output:["_col1","_col3","_col7","_col11","_col14"]
                <-Map 8 [BROADCAST_EDGE] llap
                  BROADCAST [RS_38]
                    PartitionCols:_col0
                    Select Operator [SEL_20] (rows=2 width=40)
                      Output:["_col0"]
                      Filter Operator [FIL_85] (rows=2 width=40)
                        predicate:((UDFToString(role) = 'actress') and id is not null)
                        TableScan [TS_18] (rows=4 width=40)
                          default@role_type,rt,Tbl:COMPLETE,Col:NONE,Output:["id","role"]
                <-Map Join Operator [MAPJOIN_90] (rows=66589524931926816 width=138)
                    Conds:MAPJOIN_89._col0, _col6=RS_35._col0, _col0(Inner),Output:["_col1","_col3","_col7","_col10","_col11","_col14"]
                  <-Map 7 [BROADCAST_EDGE] llap
                    BROADCAST [RS_35]
                      PartitionCols:_col0, _col0
                      Select Operator [SEL_17] (rows=764741 width=105)
                        Output:["_col0","_col1"]
                        Filter Operator [FIL_84] (rows=764741 width=105)
                          predicate:((UDFToString(gender) = 'f') and (name like '%An%') and id is not null)
                          TableScan [TS_15] (rows=3058966 width=105)
                            default@name,n,Tbl:COMPLETE,Col:NONE,Output:["id","name","gender"]
                  <-Map Join Operator [MAPJOIN_89] (rows=60535930444216152 width=152)
                      Conds:MAPJOIN_88._col7, _col4=RS_32._col0, _col1(Inner),Output:["_col0","_col1","_col3","_col6","_col7","_col10","_col11"]
                    <-Map 6 [BROADCAST_EDGE] llap
                      BROADCAST [RS_32]
                        PartitionCols:_col0, _col1
                        Select Operator [SEL_14] (rows=11636372 width=8)
                          Output:["_col0","_col1"]
                          Filter Operator [FIL_83] (rows=11636372 width=8)
                            predicate:(movie_id is not null and company_id is not null)
                            TableScan [TS_12] (rows=11636372 width=8)
                              default@movie_companies,mc,Tbl:COMPLETE,Col:NONE,Output:["movie_id","company_id"]
                    <-Map Join Operator [MAPJOIN_88] (rows=55032662847395656 width=167)
                        Conds:MAPJOIN_87._col0, _col2=RS_29._col0, _col2(Inner),Output:["_col0","_col1","_col3","_col4","_col6","_col7","_col10"]
                      <-Map 4 [BROADCAST_EDGE] llap
                        BROADCAST [RS_29]
                          PartitionCols:_col0, _col2
                          Select Operator [SEL_8] (rows=6110483 width=116)
                            Output:["_col0","_col1","_col2","_col4"]
                            Filter Operator [FIL_81] (rows=6110483 width=116)
                              predicate:((note) IN ('(voice)', '(voice: Japanese version)', '(voice) (uncredited)', '(voice: English version)') and person_id is not null and person_role_id is not null and movie_id is not null and role_id is not null)
                              TableScan [TS_6] (rows=12220966 width=116)
                                default@cast_info,ci,Tbl:COMPLETE,Col:NONE,Output:["person_id","movie_id","person_role_id","note","role_id"]
                      <-Map Join Operator [MAPJOIN_87] (rows=50029692413268030 width=184)
                          Conds:(Inner),(Inner),Output:["_col0","_col1","_col2","_col3","_col4"]
                        <-Map 1 [BROADCAST_EDGE] llap
                          BROADCAST [RS_24]
                            Select Operator [SEL_2] (rows=701938 width=104)
                              Output:["_col0","_col1"]
                              Filter Operator [FIL_79] (rows=701938 width=104)
                                predicate:person_id is not null
                                TableScan [TS_0] (rows=701938 width=104)
                                  default@aka_name,an,Tbl:COMPLETE,Col:NONE,Output:["person_id","name"]
                        <-Map 5 [BROADCAST_EDGE] llap
                          BROADCAST [RS_26]
                            Select Operator [SEL_11] (rows=34365 width=259)
                              Output:["_col0"]
                              Filter Operator [FIL_82] (rows=34365 width=259)
                                predicate:((UDFToString(country_code) = '[us]') and id is not null)
                                TableScan [TS_9] (rows=68731 width=259)
                                  default@company_name,cn,Tbl:COMPLETE,Col:NONE,Output:["id","country_code"]
                        <-Select Operator [SEL_5] (rows=2074019 width=104)
                            Output:["_col0","_col1"]
                            Filter Operator [FIL_80] (rows=2074019 width=104)
                              predicate:id is not null
                              TableScan [TS_3] (rows=2074019 width=104)
                                default@char_name,chn,Tbl:COMPLETE,Col:NONE,Output:["id","name"]

1. Start reading from the bottom

2. Look for the keyword 'TableScan', the row below it will have the details of the tables being joined

3. Look for the keyword 'Join Operator', the indent of this line will tell us the tables that are part of this join

4. This joined table group will form candidate for another join if the line starts with '<-'

5. Similarly keep going to top to get the complete join order

The join order in the above example boils down to (title(t),(((movie_companies(mc),(cast_info(ci),(company_name(cn),aka_name(an),char_name(chn)))),name(n)),role_type(rt)))

1,429 Views