Support Questions

Find answers, ask questions, and share your expertise

Hive explain plan fetching wrong row numbers

avatar
Contributor

Hi All,

My table has 3776 rows .

1) when i do :

explain select * from table

Statistics: Num rows: 274 Data size: 110912 Basic stats: COMPLETE Column stats: NONE

2) when i do :

explain select count(*) from table;

Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE

3) when i do :

explain select partner_id from table:

Statistics: Num rows: 27728 Data size: 110912 Basic stats: COMPLETE Column stats: NONE

Appreciate your help

3 REPLIES 3

avatar

Hi @Satish Anjaneyappa!

Did you try to run analyze table?

hive> analyze table salaries compute statistics;
Query ID = hive_20180624055914_ad06fa7a-ae16-4658-a4fb-5eabc1c2425a
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1529818303832_0002)
--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 01/01  [==========================>>] 100%  ELAPSED TIME: 4.54 s     
--------------------------------------------------------------------------------
Table default.salaries stats: [numFiles=1, numRows=50, totalSize=781, rawDataSize=732]
OK
Time taken: 5.64 seconds

And I have a few questions about your issue:

- What kinda table we're talking? External or Managed? Could you share with us the describe formatted output from your table?

- Are you running hive on tez or mr? And is there any specific file format for this table?

More details about the analyze on the link below 🙂
https://cwiki.apache.org/confluence/display/Hive/StatsDev

Hope this helps!

avatar
Contributor

Hi Vinicius,

Yes its

1) Managed table

2) Engine is MR

3) Its a text format

hive> show create table payout_curreny_master_temp ;
OK
CREATE TABLE `payout_curreny_master_temp`(
  `partner_id` int,
  `program_name` string,
  `program_type` string,
  `payout_currency` string,
  `delete_flag` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'field.delim'=',',
  'serialization.format'=',')
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://nameservice1/user/hive/warehouse/xyz/payout_curreny_master_temp'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='true',
  'numFiles'='1',
  'numRows'='0',
  'rawDataSize'='0',
  'totalSize'='110912',
  'transient_lastDdlTime'='1529665548')

avatar

Hi @Satish Anjaneyappa!

Hm, what about the ANALYZE TABLE <TBL_NAME> COMPUTE STATISTICS?
I made a test here, and it's doing good so far:

--TABLE HAS 50 ROWS!
0: jdbc:hive2://node3:10000/default> CREATE EXTERNAL TABLE `salaries`(                               
0: jdbc:hive2://node3:10000/default>   `gender` string,                                             
0: jdbc:hive2://node3:10000/default>   `age` int,                                                   
0: jdbc:hive2://node3:10000/default>   `salary` double,                                             
0: jdbc:hive2://node3:10000/default>   `zip` int)                                                   
0: jdbc:hive2://node3:10000/default> ROW FORMAT DELIMITED                                           
0: jdbc:hive2://node3:10000/default>   FIELDS TERMINATED BY ','                                     
0: jdbc:hive2://node3:10000/default> STORED AS INPUTFORMAT                                           
0: jdbc:hive2://node3:10000/default>   'org.apache.hadoop.mapred.TextInputFormat'                   
0: jdbc:hive2://node3:10000/default> OUTPUTFORMAT                                                   
0: jdbc:hive2://node3:10000/default>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
0: jdbc:hive2://node3:10000/default> LOCATION                                                       
0: jdbc:hive2://node3:10000/default>   'hdfs://Admin-TrainingNS/apps/hive/warehouse/salaries'       
0: jdbc:hive2://node3:10000/default> TBLPROPERTIES (                                                 
0: jdbc:hive2://node3:10000/default>   'COLUMN_STATS_ACCURATE'='true',         
0: jdbc:hive2://node3:10000/default>   'numFiles'='1',                                               
0: jdbc:hive2://node3:10000/default>   'numRows'='0',                                               
0: jdbc:hive2://node3:10000/default>   'rawDataSize'='732',                                         
0: jdbc:hive2://node3:10000/default>   'totalSize'='781',                                           
0: jdbc:hive2://node3:10000/default>   'transient_lastDdlTime'='1529819960');
No rows affected (0.443 seconds)
0: jdbc:hive2://node3:10000/default> explain select count(1) from salaries;
+------------------------------------------------------------------------------------------------------+--+
|                                               Explain                                                |
+------------------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES:                                                                                  |
|   Stage-1 is a root stage                                                                            |
|   Stage-0 depends on stages: Stage-1                                                                 |
|                                                                                                      |
| STAGE PLANS:                                                                                         |
|   Stage: Stage-1                                                                                     |
|     Map Reduce                                                                                       |
|       Map Operator Tree:                                                                             |
|           TableScan                                                                                  |
|             alias: salaries                                                                          |
|             Statistics: Num rows: 1 Data size: 732 Basic stats: COMPLETE Column stats: COMPLETE      |
|             Select Operator                                                                          |
|               Statistics: Num rows: 1 Data size: 732 Basic stats: COMPLETE Column stats: COMPLETE    |
|               Group By Operator                                                                      |
|                 aggregations: count(1)                                                               |
|                 mode: hash                                                                           |
|                 outputColumnNames: _col0                                                             |
|                 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE    |
|                 Reduce Output Operator                                                               |
|                   sort order:                                                                        |
|                   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE  |
|                   value expressions: _col0 (type: bigint)                                            |
|       Reduce Operator Tree:                                                                          |
|         Group By Operator                                                                            |
|           aggregations: count(VALUE._col0)                                                           |
|           mode: mergepartial                                                                         |
|           outputColumnNames: _col0                                                                   |
|           Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE          |
|           File Output Operator                                                                       |
|             compressed: false                                                                        |
|             Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE        |
|             table:                                                                                   |
|                 input format: org.apache.hadoop.mapred.TextInputFormat                               |
|                 output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat            |
|                 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                            |
|                                                                                                      |
|   Stage: Stage-0                                                                                     |
|     Fetch Operator                                                                                   |
|       limit: -1                                                                                      |
|       Processor Tree:                                                                                |
|         ListSink                                                                                     |
|                                                                                                      |
+------------------------------------------------------------------------------------------------------+--+
42 rows selected (0.255 seconds)
0: jdbc:hive2://node3:10000/default> explain select * from salaries;
+----------------------------------------------------------------------------------------------------------+--+
|                                                 Explain                                                  |
+----------------------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES:                                                                                      |
|   Stage-0 is a root stage                                                                                |
|                                                                                                          |
| STAGE PLANS:                                                                                             |
|   Stage: Stage-0                                                                                         |
|     Fetch Operator                                                                                       |
|       limit: -1                                                                                          |
|       Processor Tree:                                                                                    |
|         TableScan                                                                                        |
|           alias: salaries                                                                                |
|           Statistics: Num rows: 6 Data size: 732 Basic stats: COMPLETE Column stats: NONE                |
|           Select Operator                                                                                |
|             expressions: gender (type: string), age (type: int), salary (type: double), zip (type: int)  |
|             outputColumnNames: _col0, _col1, _col2, _col3                                                |
|             Statistics: Num rows: 6 Data size: 732 Basic stats: COMPLETE Column stats: NONE              |
|             ListSink                                                                                     |
|                                                                                                          |
+----------------------------------------------------------------------------------------------------------+--+
17 rows selected (0.232 seconds)
0: jdbc:hive2://node3:10000/default> desc salaries;
+-----------+------------+----------+--+
| col_name  | data_type  | comment  |
+-----------+------------+----------+--+
| gender    | string     |          |
| age       | int        |          |
| salary    | double     |          |
| zip       | int        |          |
+-----------+------------+----------+--+
4 rows selected (0.426 seconds)
0: jdbc:hive2://node3:10000/default> explain select age from salaries;
+------------------------------------------------------------------------------------------------+--+
|                                            Explain                                             |
+------------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES:                                                                            |
|   Stage-0 is a root stage                                                                      |
|                                                                                                |
| STAGE PLANS:                                                                                   |
|   Stage: Stage-0                                                                               |
|     Fetch Operator                                                                             |
|       limit: -1                                                                                |
|       Processor Tree:                                                                          |
|         TableScan                                                                              |
|           alias: salaries                                                                      |
|           Statistics: Num rows: 183 Data size: 732 Basic stats: COMPLETE Column stats: NONE    |
|           Select Operator                                                                      |
|             expressions: age (type: int)                                                       |
|             outputColumnNames: _col0                                                           |
|             Statistics: Num rows: 183 Data size: 732 Basic stats: COMPLETE Column stats: NONE  |
|             ListSink                                                                           |
|                                                                                                |
+------------------------------------------------------------------------------------------------+--+
0: jdbc:hive2://node3:10000/default> analyze table salaries compute statistics ;
INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_1529940007017_0004
INFO  : The url to track the job: http://node4:8088/proxy/application_1529940007017_0004/
INFO  : Starting Job = job_1529940007017_0004, Tracking URL = http://node4:8088/proxy/application_1529940007017_0004/
INFO  : Kill Command = /usr/hdp/2.6.5.0-292/hadoop/bin/hadoop job  -kill job_1529940007017_0004
INFO  : Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
INFO  : 2018-06-28 16:38:56,357 Stage-0 map = 0%,  reduce = 0%
INFO  : 2018-06-28 16:39:02,796 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 2.93 sec
INFO  : MapReduce Total cumulative CPU time: 2 seconds 930 msec
INFO  : Ended Job = job_1529940007017_0004
INFO  : Table default.salaries stats: [numFiles=1, numRows=50, totalSize=781, rawDataSize=732]
No rows affected (16.338 seconds)
0: jdbc:hive2://node3:10000/default> explain select * from salaries;
+----------------------------------------------------------------------------------------------------------+--+
|                                                 Explain                                                  |
+----------------------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES:                                                                                      |
|   Stage-0 is a root stage                                                                                |
|                                                                                                          |
| STAGE PLANS:                                                                                             |
|   Stage: Stage-0                                                                                         |
|     Fetch Operator                                                                                       |
|       limit: -1                                                                                          |
|       Processor Tree:                                                                                    |
|         TableScan                                                                                        |
|           alias: salaries                                                                                |
|           Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE               |
|           Select Operator                                                                                |
|             expressions: gender (type: string), age (type: int), salary (type: double), zip (type: int)  |
|             outputColumnNames: _col0, _col1, _col2, _col3                                                |
|             Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE             |
|             ListSink                                                                                     |
|                                                                                                          |
+----------------------------------------------------------------------------------------------------------+--+
17 rows selected (0.226 seconds)
0: jdbc:hive2://node3:10000/default> explain select age from salaries;
+-----------------------------------------------------------------------------------------------+--+
|                                            Explain                                            |
+-----------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES:                                                                           |
|   Stage-0 is a root stage                                                                     |
|                                                                                               |
| STAGE PLANS:                                                                                  |
|   Stage: Stage-0                                                                              |
|     Fetch Operator                                                                            |
|       limit: -1                                                                               |
|       Processor Tree:                                                                         |
|         TableScan                                                                             |
|           alias: salaries                                                                     |
|           Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE    |
|           Select Operator                                                                     |
|             expressions: age (type: int)                                                      |
|             outputColumnNames: _col0                                                          |
|             Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE  |
|             ListSink                                                                          |
|                                                                                               |
+-----------------------------------------------------------------------------------------------+--+
17 rows selected (0.227 seconds)
And also, try to set the
set hive.stats.autogather=true;

Hope this helps!