Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Hive compute column stats without invoking a (MR or Tez) job

Explorer

To Compute column stats we can use the Analyze Table...Compute statement.

However, any execution in hive will trigger a mapreduce or a tez job.

Is there a way we can do the same task without invoking a MR or tez job?

Update:

I did not quite get why one would want to do such an execution. However, I posted this as I was asked this question in an interview recently.

3 REPLIES 3

@Ramesh Prasad

To avoid spawning of MR / Tez job, use NOSCAN keyword at the end of analyze statement.

When the optional parameter NOSCAN is specified, the command won't scan files so that it's supposed to be fast. Instead of all statistics, it just gathers the following statistics:

  • Number of files
  • Physical size in bytes

https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-NewlyCreatedTables

Explorer

@Sindhu I want to compute column stats too, so I tried using NOSCAN, however, hive interprets it as syntactical error. It only works for Analyze Table.

How can we compute column stats ?


Thank you for your effort.

Contributor

@Ramesh Prasad

It can't be done via any regular/standard and automated way, also any other procedure is not recommended. So I think the interviewer just wanted to make sure you understand the basic MR/Hadoop concepts.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.