Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

yarn & tez container memory

yarn & tez container memory

Expert Contributor

TL;DR; how to properly set up hive.tez.container.size for a job with wildly different steps?

I have a 8 data node hdp2.6 cluster, all data nodes are identical, with 32GB ram.

  • yarn.scheduler.maximum-allocation-mb is set up to the total server ram minus what is used by other services (OS, nodemanager...), ie. 20GB in my case,
  • yarn.scheduler.minimum-allocation-mb is set up to 1GB,

I am running only one hive MERGE statement, once per day, which has about 100k mappers.

If I set up hive.tez.container.size to 1GB, many mappers can run in parallel (faster query), but I will end up with one of those errors:

  • Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 3, vertexId=vertex_1510697553800_0993_2_03, diagnostics=[Task failed, taskId=task_1510697553800_0993_2_03_000150, diagnostics=[TaskAttempt 0 failed, info=[Container container_e102_1510697553800_0993_01_000042 finished with diagnostics set to [Container failed, exitCode=-104. Container [pid=32295,containerID=container_e102_1510697553800_0993_01_000042] is running beyond physical memory limits. Current usage: 5.4 GB of 5.3 GB physical memory used; 7.4 GB of 11.0 GB virtual memory used. Killing container.
  • Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 3, vertexId=vertex_1511269090751_0011_2_03, diagnostics=[Exception in VertexManager, vertex:vertex_1511269090751_0011_2_03 [Reducer 3],org.apache.tez.dag.api.TezUncheckedException: Atleast 1 bipartite source should exist.

If I set up hive.tez.container.size to a bigger value I will run a lot less queries in parallel (longer query time) but eventually the query will succeed.

The thing is that I do not know in advance how big the data will be so even if I by trial and error find a good hive.tez.container.size it might not be good enough tomorrow, and maybe eventually my server memory will be too small . Further more, sizing for the worst case scenario feels like a waste of resource.

Is there any way to have a sort of dynamic tez container size to get a fast and succeeding query?

Cheers,

Don't have an account?
Coming from Hortonworks? Activate your account here