Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Difference between hive analyze commands

Solved Go to solution
Highlighted

Difference between hive analyze commands

Can someone help me explain what the difference is between these 2 hive analyze commands:

analyze table svcrpt.predictive_customers compute statistics;
analyze table svcrpt.predictive_customers compute statistics for columns;

What more does the "for columns" part do?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Difference between hive analyze commands

1. analyze table svcrpt.predictive_customers compute statistics;

will compute basic stats of the table like numFiles, numRows, totalSize, rawDataSize in the table, these are stored in

TABLE_PARAMS table under hive metastore db.

2. analyze table svcrpt.predictive_customers compute statistics for columns;

create/update column level stats like NUM_DISTINCTS,LOW_VALUE,HIGH_VALUE,NUM_NULLS etc in TAB_COL_STATS table under metastore db

View solution in original post

4 REPLIES 4
Highlighted

Re: Difference between hive analyze commands

1. analyze table svcrpt.predictive_customers compute statistics;

will compute basic stats of the table like numFiles, numRows, totalSize, rawDataSize in the table, these are stored in

TABLE_PARAMS table under hive metastore db.

2. analyze table svcrpt.predictive_customers compute statistics for columns;

create/update column level stats like NUM_DISTINCTS,LOW_VALUE,HIGH_VALUE,NUM_NULLS etc in TAB_COL_STATS table under metastore db

View solution in original post

Highlighted

Re: Difference between hive analyze commands

Got it, thanks! Does the for columns command also do the basic stats that the first analyze command does, or would I have to run them both to get both sets of stats computed?

Highlighted

Re: Difference between hive analyze commands

with columns stats you will be able to update basic stat also

Highlighted

Re: Difference between hive analyze commands

Thanks. I just did my own testing to see if "for columns" would also update TABLE_PARAMS table and I found that it did not.

For instance, when I run "analyze table svcrpt.predictive_customers compute statistics;" the column transient_lastDdlTime in the table TABLE_PARAMS gets updated, but if I run "analyze table svcrpt.predictive_customers compute statistics for columns;" transient_lastDdlTime does not updated.

So does this mean "for columns" does not update the basic stats?

Don't have an account?
Coming from Hortonworks? Activate your account here