Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive explain plan fetching wrong row numbers

avatar
Contributor

Hi All,

My table has 3776 rows .

1) when i do :

explain select * from table

Statistics: Num rows: 274 Data size: 110912 Basic stats: COMPLETE Column stats: NONE

2) when i do :

explain select count(*) from table;

Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE

3) when i do :

explain select partner_id from table:

Statistics: Num rows: 27728 Data size: 110912 Basic stats: COMPLETE Column stats: NONE

Appreciate your help

3 REPLIES 3

avatar

Hi @Satish Anjaneyappa!

Did you try to run analyze table?

hive> analyze table salaries compute statistics;
Query ID = hive_20180624055914_ad06fa7a-ae16-4658-a4fb-5eabc1c2425a
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1529818303832_0002)
--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 01/01  [==========================>>] 100%  ELAPSED TIME: 4.54 s     
--------------------------------------------------------------------------------
Table default.salaries stats: [numFiles=1, numRows=50, totalSize=781, rawDataSize=732]
OK
Time taken: 5.64 seconds

And I have a few questions about your issue:

- What kinda table we're talking? External or Managed? Could you share with us the describe formatted output from your table?

- Are you running hive on tez or mr? And is there any specific file format for this table?

More details about the analyze on the link below 🙂
https://cwiki.apache.org/confluence/display/Hive/StatsDev

Hope this helps!

avatar
Contributor

Hi Vinicius,

Yes its

1) Managed table

2) Engine is MR

3) Its a text format

hive> show create table payout_curreny_master_temp ;
OK
CREATE TABLE `payout_curreny_master_temp`(
  `partner_id` int,
  `program_name` string,
  `program_type` string,
  `payout_currency` string,
  `delete_flag` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'field.delim'=',',
  'serialization.format'=',')
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://nameservice1/user/hive/warehouse/xyz/payout_curreny_master_temp'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='true',
  'numFiles'='1',
  'numRows'='0',
  'rawDataSize'='0',
  'totalSize'='110912',
  'transient_lastDdlTime'='1529665548')

avatar

Hi @Satish Anjaneyappa!

Hm, what about the ANALYZE TABLE <TBL_NAME> COMPUTE STATISTICS?
I made a test here, and it's doing good so far:

--TABLE HAS 50 ROWS!
0: jdbc:hive2://node3:10000/default> CREATE EXTERNAL TABLE `salaries`(                               
0: jdbc:hive2://node3:10000/default>   `gender` string,                                             
0: jdbc:hive2://node3:10000/default>   `age` int,                                                   
0: jdbc:hive2://node3:10000/default>   `salary` double,                                             
0: jdbc:hive2://node3:10000/default>   `zip` int)                                                   
0: jdbc:hive2://node3:10000/default> ROW FORMAT DELIMITED                                           
0: jdbc:hive2://node3:10000/default>   FIELDS TERMINATED BY ','                                     
0: jdbc:hive2://node3:10000/default> STORED AS INPUTFORMAT                                           
0: jdbc:hive2://node3:10000/default>   'org.apache.hadoop.mapred.TextInputFormat'                   
0: jdbc:hive2://node3:10000/default> OUTPUTFORMAT                                                   
0: jdbc:hive2://node3:10000/default>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
0: jdbc:hive2://node3:10000/default> LOCATION                                                       
0: jdbc:hive2://node3:10000/default>   'hdfs://Admin-TrainingNS/apps/hive/warehouse/salaries'       
0: jdbc:hive2://node3:10000/default> TBLPROPERTIES (                                                 
0: jdbc:hive2://node3:10000/default>   'COLUMN_STATS_ACCURATE'='true',         
0: jdbc:hive2://node3:10000/default>   'numFiles'='1',                                               
0: jdbc:hive2://node3:10000/default>   'numRows'='0',                                               
0: jdbc:hive2://node3:10000/default>   'rawDataSize'='732',                                         
0: jdbc:hive2://node3:10000/default>   'totalSize'='781',                                           
0: jdbc:hive2://node3:10000/default>   'transient_lastDdlTime'='1529819960');
No rows affected (0.443 seconds)
0: jdbc:hive2://node3:10000/default> explain select count(1) from salaries;
+------------------------------------------------------------------------------------------------------+--+
|                                               Explain                                                |
+------------------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES:                                                                                  |
|   Stage-1 is a root stage                                                                            |
|   Stage-0 depends on stages: Stage-1                                                                 |
|                                                                                                      |
| STAGE PLANS:                                                                                         |
|   Stage: Stage-1                                                                                     |
|     Map Reduce                                                                                       |
|       Map Operator Tree:                                                                             |
|           TableScan                                                                                  |
|             alias: salaries                                                                          |
|             Statistics: Num rows: 1 Data size: 732 Basic stats: COMPLETE Column stats: COMPLETE      |
|             Select Operator                                                                          |
|               Statistics: Num rows: 1 Data size: 732 Basic stats: COMPLETE Column stats: COMPLETE    |
|               Group By Operator                                                                      |
|                 aggregations: count(1)                                                               |
|                 mode: hash                                                                           |
|                 outputColumnNames: _col0                                                             |
|                 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE    |
|                 Reduce Output Operator                                                               |
|                   sort order:                                                                        |
|                   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE  |
|                   value expressions: _col0 (type: bigint)                                            |
|       Reduce Operator Tree:                                                                          |
|         Group By Operator                                                                            |
|           aggregations: count(VALUE._col0)                                                           |
|           mode: mergepartial                                                                         |
|           outputColumnNames: _col0                                                                   |
|           Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE          |
|           File Output Operator                                                                       |
|             compressed: false                                                                        |
|             Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE        |
|             table:                                                                                   |
|                 input format: org.apache.hadoop.mapred.TextInputFormat                               |
|                 output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat            |
|                 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                            |
|                                                                                                      |
|   Stage: Stage-0                                                                                     |
|     Fetch Operator                                                                                   |
|       limit: -1                                                                                      |
|       Processor Tree:                                                                                |
|         ListSink                                                                                     |
|                                                                                                      |
+------------------------------------------------------------------------------------------------------+--+
42 rows selected (0.255 seconds)
0: jdbc:hive2://node3:10000/default> explain select * from salaries;
+----------------------------------------------------------------------------------------------------------+--+
|                                                 Explain                                                  |
+----------------------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES:                                                                                      |
|   Stage-0 is a root stage                                                                                |
|                                                                                                          |
| STAGE PLANS:                                                                                             |
|   Stage: Stage-0                                                                                         |
|     Fetch Operator                                                                                       |
|       limit: -1                                                                                          |
|       Processor Tree:                                                                                    |
|         TableScan                                                                                        |
|           alias: salaries                                                                                |
|           Statistics: Num rows: 6 Data size: 732 Basic stats: COMPLETE Column stats: NONE                |
|           Select Operator                                                                                |
|             expressions: gender (type: string), age (type: int), salary (type: double), zip (type: int)  |
|             outputColumnNames: _col0, _col1, _col2, _col3                                                |
|             Statistics: Num rows: 6 Data size: 732 Basic stats: COMPLETE Column stats: NONE              |
|             ListSink                                                                                     |
|                                                                                                          |
+----------------------------------------------------------------------------------------------------------+--+
17 rows selected (0.232 seconds)
0: jdbc:hive2://node3:10000/default> desc salaries;
+-----------+------------+----------+--+
| col_name  | data_type  | comment  |
+-----------+------------+----------+--+
| gender    | string     |          |
| age       | int        |          |
| salary    | double     |          |
| zip       | int        |          |
+-----------+------------+----------+--+
4 rows selected (0.426 seconds)
0: jdbc:hive2://node3:10000/default> explain select age from salaries;
+------------------------------------------------------------------------------------------------+--+
|                                            Explain                                             |
+------------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES:                                                                            |
|   Stage-0 is a root stage                                                                      |
|                                                                                                |
| STAGE PLANS:                                                                                   |
|   Stage: Stage-0                                                                               |
|     Fetch Operator                                                                             |
|       limit: -1                                                                                |
|       Processor Tree:                                                                          |
|         TableScan                                                                              |
|           alias: salaries                                                                      |
|           Statistics: Num rows: 183 Data size: 732 Basic stats: COMPLETE Column stats: NONE    |
|           Select Operator                                                                      |
|             expressions: age (type: int)                                                       |
|             outputColumnNames: _col0                                                           |
|             Statistics: Num rows: 183 Data size: 732 Basic stats: COMPLETE Column stats: NONE  |
|             ListSink                                                                           |
|                                                                                                |
+------------------------------------------------------------------------------------------------+--+
0: jdbc:hive2://node3:10000/default> analyze table salaries compute statistics ;
INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_1529940007017_0004
INFO  : The url to track the job: http://node4:8088/proxy/application_1529940007017_0004/
INFO  : Starting Job = job_1529940007017_0004, Tracking URL = http://node4:8088/proxy/application_1529940007017_0004/
INFO  : Kill Command = /usr/hdp/2.6.5.0-292/hadoop/bin/hadoop job  -kill job_1529940007017_0004
INFO  : Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
INFO  : 2018-06-28 16:38:56,357 Stage-0 map = 0%,  reduce = 0%
INFO  : 2018-06-28 16:39:02,796 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 2.93 sec
INFO  : MapReduce Total cumulative CPU time: 2 seconds 930 msec
INFO  : Ended Job = job_1529940007017_0004
INFO  : Table default.salaries stats: [numFiles=1, numRows=50, totalSize=781, rawDataSize=732]
No rows affected (16.338 seconds)
0: jdbc:hive2://node3:10000/default> explain select * from salaries;
+----------------------------------------------------------------------------------------------------------+--+
|                                                 Explain                                                  |
+----------------------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES:                                                                                      |
|   Stage-0 is a root stage                                                                                |
|                                                                                                          |
| STAGE PLANS:                                                                                             |
|   Stage: Stage-0                                                                                         |
|     Fetch Operator                                                                                       |
|       limit: -1                                                                                          |
|       Processor Tree:                                                                                    |
|         TableScan                                                                                        |
|           alias: salaries                                                                                |
|           Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE               |
|           Select Operator                                                                                |
|             expressions: gender (type: string), age (type: int), salary (type: double), zip (type: int)  |
|             outputColumnNames: _col0, _col1, _col2, _col3                                                |
|             Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE             |
|             ListSink                                                                                     |
|                                                                                                          |
+----------------------------------------------------------------------------------------------------------+--+
17 rows selected (0.226 seconds)
0: jdbc:hive2://node3:10000/default> explain select age from salaries;
+-----------------------------------------------------------------------------------------------+--+
|                                            Explain                                            |
+-----------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES:                                                                           |
|   Stage-0 is a root stage                                                                     |
|                                                                                               |
| STAGE PLANS:                                                                                  |
|   Stage: Stage-0                                                                              |
|     Fetch Operator                                                                            |
|       limit: -1                                                                               |
|       Processor Tree:                                                                         |
|         TableScan                                                                             |
|           alias: salaries                                                                     |
|           Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE    |
|           Select Operator                                                                     |
|             expressions: age (type: int)                                                      |
|             outputColumnNames: _col0                                                          |
|             Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE  |
|             ListSink                                                                          |
|                                                                                               |
+-----------------------------------------------------------------------------------------------+--+
17 rows selected (0.227 seconds)
And also, try to set the
set hive.stats.autogather=true;

Hope this helps!