Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive compute column stats without invoking a (MR or Tez) job

Hive compute column stats without invoking a (MR or Tez) job

Explorer

To Compute column stats we can use the Analyze Table...Compute statement.

However, any execution in hive will trigger a mapreduce or a tez job.

Is there a way we can do the same task without invoking a MR or tez job?

Update:

I did not quite get why one would want to do such an execution. However, I posted this as I was asked this question in an interview recently.

3 REPLIES 3
Highlighted

Re: Hive compute column stats without invoking a (MR or Tez) job

@Ramesh Prasad

To avoid spawning of MR / Tez job, use NOSCAN keyword at the end of analyze statement.

When the optional parameter NOSCAN is specified, the command won't scan files so that it's supposed to be fast. Instead of all statistics, it just gathers the following statistics:

  • Number of files
  • Physical size in bytes

https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-NewlyCreatedTables

Highlighted

Re: Hive compute column stats without invoking a (MR or Tez) job

Explorer

@Sindhu I want to compute column stats too, so I tried using NOSCAN, however, hive interprets it as syntactical error. It only works for Analyze Table.

How can we compute column stats ?


Thank you for your effort.

Highlighted

Re: Hive compute column stats without invoking a (MR or Tez) job

Contributor

@Ramesh Prasad

It can't be done via any regular/standard and automated way, also any other procedure is not recommended. So I think the interviewer just wanted to make sure you understand the basic MR/Hadoop concepts.

Don't have an account?
Coming from Hortonworks? Activate your account here