Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Increasing Memory for Custom Mapper Python Script in Hive Transform

Highlighted

Increasing Memory for Custom Mapper Python Script in Hive Transform

New Contributor

Hi ,

we are running a Hive Transform Application in with a custom Mapper and Reducer script written in Python .

Data is flowing from Mapper to Reducer through Standard Input .

We are using the below settings for Hive Query .

SET hive.execution.engine=mr;  
            SET hive.exec.compress.output=false;
            set hive.cbo.enable=true;
            set hive.compute.query.using.stats=true;
            set hive.stats.fetch.column.stats=true;
            set hive.stats.fetch.partition.stats=true;
            set hive.exec.parallel=true;
            set hive.auto.convert.join=true;
            SET hive.exec.dynamic.partition=true;
            SET hive.exec.dynamic.partition.mode=nonstrict;
			SET mapreduce.map.memory.mb=24576;
            SET mapreduce.reduce.memory.mb=32768;
            SET mapreduce.reduce.java.opts: -Xmx24576m;
            SET yarn.app.mapreduce.am.resource.mb=32768;
            SET yarn.nodemanager.vmem-check-enabled=false;

But this Job is getting failed with following error.

Container [pid=****,containerID=container_********] is running beyond physical memory limits. 
Current usage: 28.2 GB of 28 GB physical memory used; 36.2 GB of 58.8 GB virtual memory used. 
Killing container. Dump of the process-tree for container_e207_ : 
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
 |- 87109 87066 87066 87066 (java) 1265 243 9106788352 206905 -server -XX:NewRatio=8 
-Djava.net.preferIPv4Stack=true 
-Dhdp.version=2.6.1.0-129 -Xmx6553m 
-Dlog4j.configuration=container-log4j.properties 
 -Dyarn.app.container.log.filesize=0 
-Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog 
-Dyarn.app.mapreduce.shuffle.logger=INFO,shuffleCLA 
-Dyarn.app.mapreduce.shuffle.logfile=syslog.shuffle 
-Dyarn.app.mapreduce.shuffle.log.filesize=0 
-Dyarn.app.mapreduce.shuffle.log.backups=0 
-Djava.net.preferIPv4Stack=true
-Dhdp.version=2.6.1.0-129  -Xmx6553m 
-Dlog4j.configuration=container-log4j.properties 
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog -Dyarn.app.mapreduce.shuffle.logger=INFO,shuffleCLA 
-Dyarn.app.mapreduce.shuffle.logfile=syslog.shuffle 
-Dyarn.app.mapreduce.shuffle.log.filesize=0 
-Dyarn.app.mapreduce.shuffle.log.backups=0 
Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143

How do i increase the Memory for Custom Reducer Python script? .My understand is that Hive Settings working only for Job which works withing the JVM .I could not find any settings in Hive which allows me to update the memory for this python script .

Don't have an account?
Coming from Hortonworks? Activate your account here