Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Query shows no progress on tez

Explorer

I have a rather large partitioned table with the following description.

|PARTITIONED BY (                                                                               |
|   `dt_month` string)                                                                           |
| ROW FORMAT SERDE                                                                               |
|   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'                                                  |
| STORED AS INPUTFORMAT                                                                          |
|   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'                                            |
| OUTPUTFORMAT                                                                                   |
|   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'                                           |
| LOCATION                                                                                       |
|   ''  |
| TBLPROPERTIES (                                                                                |
|   'COLUMN_STATS_ACCURATE'='true',                                                              |
|   'last_modified_by'='user1',                                                                 |
|   'last_modified_time'='1483944081',                                                           |
|   'orc.bloom.filter.columns'='id2,id1,pjkd,gen_ds,key,rf',                            |
|   'orc.bloom.filter.fpp'='0.05',                                                               |
|   'orc.create.index'='true',                                                                   |
|   'spark.sql.partitionProvider'='catalog',                                                     |
|   'spark.sql.statistics.colStats.id1.avgLen'='10',                                            |
|   'spark.sql.statistics.colStats.id1.distinctCount'='36919609',                               |
|   'spark.sql.statistics.colStats.id1.maxLen'='10',                                            |
|   'spark.sql.statistics.colStats.id1.nullCount'='0',                                          |
|   'spark.sql.statistics.colStats.id1.version'='1',                                            |
|   'spark.sql.statistics.colStats.id2.avgLen'='10',                                            |
|   'spark.sql.statistics.colStats.id2.distinctCount'='9936718',                                |
|   'spark.sql.statistics.colStats.id2.maxLen'='10',                                            |
|   'spark.sql.statistics.colStats.id2.nullCount'='0',                                          |
|   'spark.sql.statistics.colStats.id2.version'='1',                                            |
|   'spark.sql.statistics.numRows'='20623558494',                                                |
|   'spark.sql.statistics.totalSize'='1355372444614',                                            |
|   'transient_lastDdlTime'='1504869132')                                                        |
+------------------------------------------------------------------------------------------------+--+

I tried to execute a simple max on one of the columns with the following query and it shows no progress whatsoever. I have checked to ensure there are no locks on the table. I have checked that there are enough free resources on the cluster available. There is an application id created for the hive query and it shows up on yarn with no progress.

45409-no-progress.png

Any tips on how I can start debugging would be very welcome!

set hive.exec.orc.memory.pool=1.0;
set hive.exec.parallel=true;
set tez.runtime.unordered.output.buffer.size-mb=819;
set tez.runtime.io.sort.mb = 3276;
set hive.tez.container.size = 9216;
set hive.tez.java.opts = -server -Djava.net.preferIPv4Stack=true -XX:NewRatio=8 -XX:+UseNUMA -XX:+UseParallelGC -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps;
set mapreduce.map.memory.mb=5120
set mapreduce.map.java.opts=-Xmx4096m
set mapreduce.reduce.memory.mb=12288
set mapreduce.reduce.java.opts=-Xmx9830m 
set tez.am.container.reuse.enabled=true;
set hive.mapjoin.optimized.hashtable=true;
set hive.prewarm.enabled=true;
set hive.tez.dynamic.partition.pruning=false;
select 'g' + max(gen_ds) as gen_ds_old from table;
3 REPLIES 3

Explorer

Query succeeds on mr!

Cloudera Employee

Hi Subramaniam:

It is likely that the job is exceeding the available YARN resources on your cluster or as defined by your default YARN queue. Take a look at the YARN Resource Manager UI for the job. You will likely see that the required number of vcores or RAM exceeds the resources made available to YARN on your cluster.

Thank you,

Jeff Groves

Explorer

Hi @Jeff Groves,

thanks for looking into this. But that doesnt seem very probably. Right now there is about 4TB of memory available on the cluster and there are 400Vcores available for the default queue that I am running on. The container size on yarn can be anything between 4 and 156GB and between 1 and 16 Vcores based on the configuration in yarn.

Given that I am requesting roughly 9GB for each tez container and the default number of vcores(1 I assume), I shouldnt be exceeding these in anyway!

The only problem that I can see is that, since this is a large table, I expect some 4000 containers to be requested totally but shouldnt yarn just handle them on a max allocation basis?(like 300 at a time and they get scheduled as some complete and become available?) If not, how exactly can I control this behavior?

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.