Support Questions

Find answers, ask questions, and share your expertise

Difference between hive analyze commands

avatar
Rising Star

Can someone help me explain what the difference is between these 2 hive analyze commands:

analyze table svcrpt.predictive_customers compute statistics;
analyze table svcrpt.predictive_customers compute statistics for columns;

What more does the "for columns" part do?

1 ACCEPTED SOLUTION

avatar
Super Guru

1. analyze table svcrpt.predictive_customers compute statistics;

will compute basic stats of the table like numFiles, numRows, totalSize, rawDataSize in the table, these are stored in

TABLE_PARAMS table under hive metastore db.

2. analyze table svcrpt.predictive_customers compute statistics for columns;

create/update column level stats like NUM_DISTINCTS,LOW_VALUE,HIGH_VALUE,NUM_NULLS etc in TAB_COL_STATS table under metastore db

View solution in original post

4 REPLIES 4

avatar
Super Guru

1. analyze table svcrpt.predictive_customers compute statistics;

will compute basic stats of the table like numFiles, numRows, totalSize, rawDataSize in the table, these are stored in

TABLE_PARAMS table under hive metastore db.

2. analyze table svcrpt.predictive_customers compute statistics for columns;

create/update column level stats like NUM_DISTINCTS,LOW_VALUE,HIGH_VALUE,NUM_NULLS etc in TAB_COL_STATS table under metastore db

avatar
Rising Star

Got it, thanks! Does the for columns command also do the basic stats that the first analyze command does, or would I have to run them both to get both sets of stats computed?

avatar
Super Guru

with columns stats you will be able to update basic stat also

avatar
Rising Star

Thanks. I just did my own testing to see if "for columns" would also update TABLE_PARAMS table and I found that it did not.

For instance, when I run "analyze table svcrpt.predictive_customers compute statistics;" the column transient_lastDdlTime in the table TABLE_PARAMS gets updated, but if I run "analyze table svcrpt.predictive_customers compute statistics for columns;" transient_lastDdlTime does not updated.

So does this mean "for columns" does not update the basic stats?