Created 08-23-2017 06:09 PM
To Compute column stats we can use the Analyze Table...Compute statement.
However, any execution in hive will trigger a mapreduce or a tez job.
Is there a way we can do the same task without invoking a MR or tez job?
Update:
I did not quite get why one would want to do such an execution. However, I posted this as I was asked this question in an interview recently.
Created 08-23-2017 06:20 PM
To avoid spawning of MR / Tez job, use NOSCAN keyword at the end of analyze statement.
When the optional parameter NOSCAN is specified, the command won't scan files so that it's supposed to be fast. Instead of all statistics, it just gathers the following statistics:
https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-NewlyCreatedTables
Created 08-23-2017 06:26 PM
@Sindhu I want to compute column stats too, so I tried using NOSCAN, however, hive interprets it as syntactical error. It only works for Analyze Table.
How can we compute column stats ?
Thank you for your effort.
Created 09-06-2017 02:09 PM
It can't be done via any regular/standard and automated way, also any other procedure is not recommended. So I think the interviewer just wanted to make sure you understand the basic MR/Hadoop concepts.