Created 06-23-2018 04:30 PM
Hi All,
My table has 3776 rows .
1) when i do :
explain select * from table
Statistics: Num rows: 274 Data size: 110912 Basic stats: COMPLETE Column stats: NONE
2) when i do :
explain select count(*) from table;
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
3) when i do :
explain select partner_id from table:
Statistics: Num rows: 27728 Data size: 110912 Basic stats: COMPLETE Column stats: NONE
Appreciate your help
Created 06-24-2018 06:09 AM
Did you try to run analyze table?
hive> analyze table salaries compute statistics; Query ID = hive_20180624055914_ad06fa7a-ae16-4658-a4fb-5eabc1c2425a Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1529818303832_0002) -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 .......... SUCCEEDED 1 1 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 4.54 s -------------------------------------------------------------------------------- Table default.salaries stats: [numFiles=1, numRows=50, totalSize=781, rawDataSize=732] OK Time taken: 5.64 seconds
And I have a few questions about your issue:
- What kinda table we're talking? External or Managed? Could you share with us the describe formatted output from your table?
- Are you running hive on tez or mr? And is there any specific file format for this table?
More details about the analyze on the link below 🙂
https://cwiki.apache.org/confluence/display/Hive/StatsDev
Hope this helps!
Created 06-25-2018 05:21 AM
Hi Vinicius,
Yes its
1) Managed table
2) Engine is MR
3) Its a text format
hive> show create table payout_curreny_master_temp ; OK CREATE TABLE `payout_curreny_master_temp`( `partner_id` int, `program_name` string, `program_type` string, `payout_currency` string, `delete_flag` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ( 'field.delim'=',', 'serialization.format'=',') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://nameservice1/user/hive/warehouse/xyz/payout_curreny_master_temp' TBLPROPERTIES ( 'COLUMN_STATS_ACCURATE'='true', 'numFiles'='1', 'numRows'='0', 'rawDataSize'='0', 'totalSize'='110912', 'transient_lastDdlTime'='1529665548')
Created 06-28-2018 04:44 PM
Hm, what about the ANALYZE TABLE <TBL_NAME> COMPUTE STATISTICS?
I made a test here, and it's doing good so far:
--TABLE HAS 50 ROWS! 0: jdbc:hive2://node3:10000/default> CREATE EXTERNAL TABLE `salaries`( 0: jdbc:hive2://node3:10000/default> `gender` string, 0: jdbc:hive2://node3:10000/default> `age` int, 0: jdbc:hive2://node3:10000/default> `salary` double, 0: jdbc:hive2://node3:10000/default> `zip` int) 0: jdbc:hive2://node3:10000/default> ROW FORMAT DELIMITED 0: jdbc:hive2://node3:10000/default> FIELDS TERMINATED BY ',' 0: jdbc:hive2://node3:10000/default> STORED AS INPUTFORMAT 0: jdbc:hive2://node3:10000/default> 'org.apache.hadoop.mapred.TextInputFormat' 0: jdbc:hive2://node3:10000/default> OUTPUTFORMAT 0: jdbc:hive2://node3:10000/default> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 0: jdbc:hive2://node3:10000/default> LOCATION 0: jdbc:hive2://node3:10000/default> 'hdfs://Admin-TrainingNS/apps/hive/warehouse/salaries' 0: jdbc:hive2://node3:10000/default> TBLPROPERTIES ( 0: jdbc:hive2://node3:10000/default> 'COLUMN_STATS_ACCURATE'='true', 0: jdbc:hive2://node3:10000/default> 'numFiles'='1', 0: jdbc:hive2://node3:10000/default> 'numRows'='0', 0: jdbc:hive2://node3:10000/default> 'rawDataSize'='732', 0: jdbc:hive2://node3:10000/default> 'totalSize'='781', 0: jdbc:hive2://node3:10000/default> 'transient_lastDdlTime'='1529819960'); No rows affected (0.443 seconds) 0: jdbc:hive2://node3:10000/default> explain select count(1) from salaries; +------------------------------------------------------------------------------------------------------+--+ | Explain | +------------------------------------------------------------------------------------------------------+--+ | STAGE DEPENDENCIES: | | Stage-1 is a root stage | | Stage-0 depends on stages: Stage-1 | | | | STAGE PLANS: | | Stage: Stage-1 | | Map Reduce | | Map Operator Tree: | | TableScan | | alias: salaries | | Statistics: Num rows: 1 Data size: 732 Basic stats: COMPLETE Column stats: COMPLETE | | Select Operator | | Statistics: Num rows: 1 Data size: 732 Basic stats: COMPLETE Column stats: COMPLETE | | Group By Operator | | aggregations: count(1) | | mode: hash | | outputColumnNames: _col0 | | Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE | | Reduce Output Operator | | sort order: | | Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE | | value expressions: _col0 (type: bigint) | | Reduce Operator Tree: | | Group By Operator | | aggregations: count(VALUE._col0) | | mode: mergepartial | | outputColumnNames: _col0 | | Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE | | File Output Operator | | compressed: false | | Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE | | table: | | input format: org.apache.hadoop.mapred.TextInputFormat | | output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | | serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | | | Stage: Stage-0 | | Fetch Operator | | limit: -1 | | Processor Tree: | | ListSink | | | +------------------------------------------------------------------------------------------------------+--+ 42 rows selected (0.255 seconds) 0: jdbc:hive2://node3:10000/default> explain select * from salaries; +----------------------------------------------------------------------------------------------------------+--+ | Explain | +----------------------------------------------------------------------------------------------------------+--+ | STAGE DEPENDENCIES: | | Stage-0 is a root stage | | | | STAGE PLANS: | | Stage: Stage-0 | | Fetch Operator | | limit: -1 | | Processor Tree: | | TableScan | | alias: salaries | | Statistics: Num rows: 6 Data size: 732 Basic stats: COMPLETE Column stats: NONE | | Select Operator | | expressions: gender (type: string), age (type: int), salary (type: double), zip (type: int) | | outputColumnNames: _col0, _col1, _col2, _col3 | | Statistics: Num rows: 6 Data size: 732 Basic stats: COMPLETE Column stats: NONE | | ListSink | | | +----------------------------------------------------------------------------------------------------------+--+ 17 rows selected (0.232 seconds) 0: jdbc:hive2://node3:10000/default> desc salaries; +-----------+------------+----------+--+ | col_name | data_type | comment | +-----------+------------+----------+--+ | gender | string | | | age | int | | | salary | double | | | zip | int | | +-----------+------------+----------+--+ 4 rows selected (0.426 seconds) 0: jdbc:hive2://node3:10000/default> explain select age from salaries; +------------------------------------------------------------------------------------------------+--+ | Explain | +------------------------------------------------------------------------------------------------+--+ | STAGE DEPENDENCIES: | | Stage-0 is a root stage | | | | STAGE PLANS: | | Stage: Stage-0 | | Fetch Operator | | limit: -1 | | Processor Tree: | | TableScan | | alias: salaries | | Statistics: Num rows: 183 Data size: 732 Basic stats: COMPLETE Column stats: NONE | | Select Operator | | expressions: age (type: int) | | outputColumnNames: _col0 | | Statistics: Num rows: 183 Data size: 732 Basic stats: COMPLETE Column stats: NONE | | ListSink | | | +------------------------------------------------------------------------------------------------+--+ 0: jdbc:hive2://node3:10000/default> analyze table salaries compute statistics ; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : number of splits:1 INFO : Submitting tokens for job: job_1529940007017_0004 INFO : The url to track the job: http://node4:8088/proxy/application_1529940007017_0004/ INFO : Starting Job = job_1529940007017_0004, Tracking URL = http://node4:8088/proxy/application_1529940007017_0004/ INFO : Kill Command = /usr/hdp/2.6.5.0-292/hadoop/bin/hadoop job -kill job_1529940007017_0004 INFO : Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0 INFO : 2018-06-28 16:38:56,357 Stage-0 map = 0%, reduce = 0% INFO : 2018-06-28 16:39:02,796 Stage-0 map = 100%, reduce = 0%, Cumulative CPU 2.93 sec INFO : MapReduce Total cumulative CPU time: 2 seconds 930 msec INFO : Ended Job = job_1529940007017_0004 INFO : Table default.salaries stats: [numFiles=1, numRows=50, totalSize=781, rawDataSize=732] No rows affected (16.338 seconds) 0: jdbc:hive2://node3:10000/default> explain select * from salaries; +----------------------------------------------------------------------------------------------------------+--+ | Explain | +----------------------------------------------------------------------------------------------------------+--+ | STAGE DEPENDENCIES: | | Stage-0 is a root stage | | | | STAGE PLANS: | | Stage: Stage-0 | | Fetch Operator | | limit: -1 | | Processor Tree: | | TableScan | | alias: salaries | | Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE | | Select Operator | | expressions: gender (type: string), age (type: int), salary (type: double), zip (type: int) | | outputColumnNames: _col0, _col1, _col2, _col3 | | Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE | | ListSink | | | +----------------------------------------------------------------------------------------------------------+--+ 17 rows selected (0.226 seconds) 0: jdbc:hive2://node3:10000/default> explain select age from salaries; +-----------------------------------------------------------------------------------------------+--+ | Explain | +-----------------------------------------------------------------------------------------------+--+ | STAGE DEPENDENCIES: | | Stage-0 is a root stage | | | | STAGE PLANS: | | Stage: Stage-0 | | Fetch Operator | | limit: -1 | | Processor Tree: | | TableScan | | alias: salaries | | Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE | | Select Operator | | expressions: age (type: int) | | outputColumnNames: _col0 | | Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE | | ListSink | | | +-----------------------------------------------------------------------------------------------+--+ 17 rows selected (0.227 seconds)
And also, try to set the
set hive.stats.autogather=true;
Hope this helps!