- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Viewing Hive Column or Table level Statistics
- Labels:
-
Apache Hive
Created ‎12-16-2015 06:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Does anyone know of a way to view the statistics which are created after the command "analyze table [myTable] compute statistics;" is executed?
Referenced from here: http://hortonworks.com/blog/5-ways-make-hive-queries-run-faster/
Created ‎12-16-2015 06:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Wes Floyd Here is Hive Stats detail https://cwiki.apache.org/confluence/display/Hive/S...
You can view the stored statistics by issuing the DESCRIBE command. Statistics are stored in the Parameters array. Suppose you issue the analyze command for the whole table Table1, then issue the command:
DESCRIBE EXTENDED TABLE1;
then among the output, the following would be displayed:
... , parameters:{numPartitions=
4
, numFiles=
16
, numRows=
2000
, totalSize=
16384
, ...}, ....
Created ‎12-16-2015 06:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Wes Floyd Here is Hive Stats detail https://cwiki.apache.org/confluence/display/Hive/S...
You can view the stored statistics by issuing the DESCRIBE command. Statistics are stored in the Parameters array. Suppose you issue the analyze command for the whole table Table1, then issue the command:
DESCRIBE EXTENDED TABLE1;
then among the output, the following would be displayed:
... , parameters:{numPartitions=
4
, numFiles=
16
, numRows=
2000
, totalSize=
16384
, ...}, ....
Created ‎12-16-2015 11:34 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
See this thread here: https://community.hortonworks.com/questions/4759/hive-explain-says-plan-not-optimized-by-cbo-due-to....
We couldn't find a way to see "columns" stats (analyze table t compute statistics for columns). I think describe extended shows only table stats.
Also looking for a solution to get rid of warning: Plan not optimized by CBO due to missing statistics. Please check log for more details, from above question.
Created ‎12-18-2015 01:55 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Guilherme Braccialli If you've already analyzed the columns you can issue a describe table command to get column stats:
"As of Hive 0.10.0, the optional parameter FOR COLUMNS computes column statistics for all columns in the specified table (and for all partitions if the table is partitioned). See Column Statistics in Hive for details.
To display these statistics, use DESCRIBE FORMATTED [db_name.]table_name column_name [PARTITION (partition_spec)]."
https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables
Created ‎12-18-2015 01:57 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, @Neeraj Sabharwal and I tried this few times, but we can't see column statistics, only table level.
Created ‎02-20-2017 12:27 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've got a working example at https://github.com/lestermartin/oss-transform-processing-comparison/tree/master/profiling#hive that shows column stats.
Created ‎02-03-2016 02:33 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Pardeep. This helps me understand how to see Table level statistics. Do you have a solution for Column level stats also?
Created ‎02-03-2016 02:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Wes Floyd has this been resolved? Please provide your solution or accept best answer.
Created ‎07-27-2016 04:13 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For those interested in viewing column level stats try this...
analyze table orderdetails compute statistics for columns; describe formatted orderdetails.unitprice; col_name data_type min max num_nulls distinct_count avg_col_len max_col_len num_trues num_falses comment unitprice double 2.0 26.3 0 127 foo
Created ‎06-30-2017 01:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For those looking for an easy graphical tool, the Hive View 2.0 (included with Ambari 2.5 and up) has the ability to view table and column level stats, and to compute them if they are missing.
For more info see https://hortonworks.com/blog/3-great-reasons-to-try-hive-view-2-0/
Note that column stats are listed under table stats and you can see the individual column's statistics there.
