Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What JVMs are spawned when the Analyze command is run on a partitioned ORC table in Hive?

What JVMs are spawned when the Analyze command is run on a partitioned ORC table in Hive?

New Contributor

Hi,

I am trying to run the Analyze table command for a big Hive table (approx size 3 TB) and it is throwing Java OOM error. I have tried increasing the Heap on almost all the containers: (Hive, HS2, Hive Metastore, YARN,Tez). But the error still persists when analyzing some of the partitions. I would like to know that the JVM spawned for the Analyze query takes its parameters from which properties? Is it like a normal Hive job (taking the parameters of Tez/Hive)?

Any information more than https://cwiki.apache.org/confluence/display/Hive/StatsDev is greatly appreciated.

Thanks

8 REPLIES 8

Re: What JVMs are spawned when the Analyze command is run on a partitioned ORC table in Hive?

New Contributor

Note: I iterated through every partition and analyzed each partition separately and it works. But the error persists when trying to analyze the whole table.

Re: What JVMs are spawned when the Analyze command is run on a partitioned ORC table in Hive?

Analyze is a MapReduce or Tez jobs like the others. They are fairly heavy.

So normally it should be the Tez parameters on the Hive configuration page.

Or just tune it in the hive console itself for faster turnaround. ( Note your yarn max. must still be bigger )

set hive.tez.java.opts="-Xmx3400m";

set hive.tez.container.size = 4096;

Re: What JVMs are spawned when the Analyze command is run on a partitioned ORC table in Hive?

New Contributor

My current configs are:

hive.tez.java.opts = "-Xmx6553m"

hive.tez.container.size=8192

My YARN max is 16GB and I have also tried the analyze command by increasing the YARN max and the above tez container properties to 16GB and 24GB repst. And yet it fails with the Java OOM error.

Re: What JVMs are spawned when the Analyze command is run on a partitioned ORC table in Hive?

Where is the OOM exactly? Perhaps in the sort memory? You might need to increase that too perhaps. There are also parameter settings for Group by memory

Sort memory:

tez.runtime.io.sort.mb

Order by in map task:

hive.map.aggr

hive.map.aggr.hash.percentmemory

Re: What JVMs are spawned when the Analyze command is run on a partitioned ORC table in Hive?

New Contributor

@Benjamin Leonhardi My Sort memory is 3276Mb. But, I managed to solve the OOM issue by increasing the Hadoop Maximum Java Heap size in HDFS configs to 16GB to increase the Hive Metastore and HS2 Java heap. After doing so, although I do not get OOM error, but running the analyze query produces this warning before completing successfully:

[Warning] could not update stats for default.perf_gcs_load_partitioned{cdrs_utc_date=2016-09-21}.

[Warning] could not update stats for default.perf_gcs_load_partitioned{cdrs_utc_date=2016-07-15}.

[Warning] could not update stats for default.perf_gcs_load_partitioned{cdrs_utc_date=2016-04-16}.

OK

Time taken: 278.255 seconds

Re: What JVMs are spawned when the Analyze command is run on a partitioned ORC table in Hive?

New Contributor

I managed to solve the OOM issue by increasing the Hadoop Maximum Java Heap size in HDFS configs to 16GB to increase the Hive Metastore and HS2 Java heap rather than the slider provided to increase the Hive Heap in the Hive configs in Ambari 2.1.2. Although I still do not know why does the Analyze query require more heap memory in HS2 and Hive Metastore. ( the size for ORC statistics is not that big)

Re: What JVMs are spawned when the Analyze command is run on a partitioned ORC table in Hive?

Not sure after the query is run he does some post actions, moving the data to the hive server and inserting them into the metastore. But why that should require 16GB of RAM ... As you said it should not be that much data.

But cool that they finally added sliders for the hive heap. I still changed hive-env settings until recently.

Re: What JVMs are spawned when the Analyze command is run on a partitioned ORC table in Hive?

New Contributor

Hey @vagrawal, @Benjamin Leonhardi, Would you please confirm if MapReduce or TEZ job is launched for `ANALYZE TABLE` command because I can't see any job been spawn and the hiveserver hangs untill the analysis is in progress.