I have a count(*) that I get OOM:
Error: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1525980884509_183392_1_00, diagnostics=[Task failed, taskId=task_1525980884509_183392_1_00_000056, diagnostics=[TaskAttempt 0 failed, info=[Container container_1525980884509_183392_01_000104 finished with diagnostics set to [Container failed, exitCode=-104. Container [pid=32165,containerID=container_1525980884509_183392_01_000104] is running beyond physical memory limits. Current usage: 1.7 GB of 1.5 GB physical memory used; 3.5 GB of 7.5 GB virtual memory used. Killing container.
I know I can just increase my mapper container size but trying to understand this a little better.
By default, the number of mappers is set to 48 which is being set by
- tez.grouping.max-size(default 1073741824 which is 1GB)
The total file size is 48GB so thats how i get the 48 mappers
I set set tez.grouping.max-size=52428800;
and this increased the number of mappers to ~750 but I still get the same OOM issue:
Current usage: 1.7 GB of 1.5 GB physical memory used; 3.5 GB of 7.5 GB virtual memory used. Killing container.
Does anyone know why this is happening? I figured that reducing the input split size (less data/mapper) would allow me to get past this error but doesnt seem to be the case
file format is also parquet so its splittable