Support Questions

Find answers, ask questions, and share your expertise

Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Cloudera Community
- :
- Support
- :
- Support Questions
- :
- Difference between hive analyze commands

Announcements

Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted

Labels:

Explorer

Created 12-15-2016 12:48 PM

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Can someone help me explain what the difference is between these 2 hive analyze commands:

analyze table svcrpt.predictive_customers compute statistics; analyze table svcrpt.predictive_customers compute statistics for columns;

What more does the "for columns" part do?

1 ACCEPTED SOLUTION

Accepted Solutions

Highlighted

Guru

Created 12-15-2016 01:02 PM

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

1. analyze table svcrpt.predictive_customers compute statistics;

will compute basic stats of the table like numFiles, numRows, totalSize, rawDataSize in the table, these are stored in

TABLE_PARAMS table under hive metastore db.

2. analyze table svcrpt.predictive_customers compute statistics for columns;

create/update column level stats like NUM_DISTINCTS,LOW_VALUE,HIGH_VALUE,NUM_NULLS etc in TAB_COL_STATS table under metastore db

4 REPLIES 4

Highlighted

Guru

Created 12-15-2016 01:02 PM

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

1. analyze table svcrpt.predictive_customers compute statistics;

will compute basic stats of the table like numFiles, numRows, totalSize, rawDataSize in the table, these are stored in

TABLE_PARAMS table under hive metastore db.

2. analyze table svcrpt.predictive_customers compute statistics for columns;

create/update column level stats like NUM_DISTINCTS,LOW_VALUE,HIGH_VALUE,NUM_NULLS etc in TAB_COL_STATS table under metastore db

Highlighted
##

Got it, thanks! Does the for columns command also do the basic stats that the first analyze command does, or would I have to run them both to get both sets of stats computed?

Re: Difference between hive analyze commands

Explorer

Created 12-15-2016 01:06 PM

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Highlighted
##

Re: Difference between hive analyze commands

Guru

Created 12-15-2016 01:18 PM

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

with columns stats you will be able to update basic stat also

Highlighted
##

Re: Difference between hive analyze commands

Explorer

Created 12-15-2016 01:24 PM

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Thanks. I just did my own testing to see if "for columns" would also update TABLE_PARAMS table and I found that it did not.

For instance, when I run "analyze table svcrpt.predictive_customers compute statistics;" the column transient_lastDdlTime in the table TABLE_PARAMS gets updated, but if I run "analyze table svcrpt.predictive_customers compute statistics for columns;" transient_lastDdlTime does not updated.

So does this mean "for columns" does not update the basic stats?

Coming from Hortonworks? Activate your account here