Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

LLAP Start Error after increase memory parameters

Contributor

llap-start-error.txt

Increase next parameters: 

HiveServer Interactive Heap Size   10240   to   13251 (MB)
Memory per Daemon                  10240   to   13251 (MB)
LLAP Daemon Heap Size              8162    to   9794  (MB)
In-Memory Cache per Daemon         2048    to   2457  (MB) 
------
Not changed :
Memory allocated for all YARN containers on a node 164 GB
LLAP Daemon Container Max Headroom 1024  (MB)


stderr: 
2017-12-04 16:19:02,069 - LLAP app 'llap0' deployment unsuccessful.
Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_server_interactive.py", line 680, in <module>
    HiveServerInteractive().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 314, in execute
    method(env)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 762, in restart
    self.start(env, upgrade_type=upgrade_type)
  File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_server_interactive.py", line 123, in start
    raise Fail("Skipping START of Hive Server Interactive since LLAP app couldn't be STARTED.")
resource_management.core.exceptions.Fail: Skipping START of Hive Server Interactive since LLAP app couldn't be STARTED.
11 REPLIES 11

Contributor

Please Help in investigation ..

Contributor

LLAP app has failed to start; application_1512395880314_0027 app logs might have a real error, if any. It's also possible that the containers failed to start because either there isn't enough memory physically on the cluster, or there isn't enough space configured in the YARN queue being used.

Cloudera Employee

@dmitro:

better if you can post the app logs or container level logs. They will have the exact error. TO me it seems like memory issue only and could be related to yarn container size.

you can get the app and container level logs this way:

yarn logs -applicationId application_1512395880314_0027

yarn logs -containerId container_e115_1512395880314_0027_01_000006

For a full trace for app and container:

yarn logs -applicationId application_1512395880314_0027 -containerId container_e115_1512395880314_0027_01_000014

Contributor
Thanks you very much!

Now see error :

"exec /usr/jdk64/jdk1.8.0_77/bin/java -Dproc_llapdaemon -Xms9794m -Xmx9794m -Dhttp.maxConnections=38 -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:TLABSize=8m -XX:+ResizeTLAB -XX:+UseNUMA 
-XX:+AggressiveOpts -XX:MetaspaceSize=1024m -XX:InitiatingHeapOccupancyPercent=80 -XX:MaxGCPauseMillis=600 -Xmx8192m -XX:MetaspaceSize=1024m -server -Djava.net.preferIPv4Stack=true 
-XX:+UseNUMA -XX:+PrintGCDetails -verbose:gc -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M -XX:+PrintGCDateStamps 
-Xloggc:/grid/1/hadoop/yarn/log/application_1512395880314_0027/container_e115_1512395880314_0027_01_000006//gc_2017-12-04-16.log  *** "


Error occurred during initialization of VM
Initial heap size set to a larger value than the maximum heap size   <<<<<<<<<<<<<<<<<
---------------------------------
Think,  need config "LLAP app java opts",   increase -Xmx8192m   to  -Xmx16384m  for example,  this is max value ?   

"LLAP app java opts" :
-XX:+AlwaysPreTouch {% if java_version > 7 %}-XX:+UseG1GC -XX:TLABSize=8m -XX:+ResizeTLAB -XX:+UseNUMA -XX:+AggressiveOpts -XX:MetaspaceSize=1024m 
-XX:InitiatingHeapOccupancyPercent=80 -XX:MaxGCPauseMillis=200{% else %}-XX:+PrintGCDetails -verbose:gc 
-XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC{% endif %} -Xmx8192m 

Contributor

Note that the commandline has two Xmx values... Ambari has a config value for LLAP Xmx that it adds to the command line; that is the recommended way to set xmx. The custom value(s) added to the args seem to be conflicting to what Ambari adds.

Contributor

Problem resolved after config "LLAP app java opts", select working, but I have error in application log :

Application Container Diagnostics Container IDComponentStateExit CodeLogsDiagnostics container_e119_1512480218177_0094_01_000002LLAP4-104Logs Container [pid=15416,containerID=container_e119_1512480218177_0094_01_000002] is running beyond physical memory limits. Current usage: 27.0 GB of 26 GB physical memory used; 36.1 GB of 54.6 GB virtual memory used. Killing container. Dump of the process-tree for container_e119_1512480218177_0094_01_000002 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 15919 1 15416 15416 (java) 148737 3440 38182346752 7082375 /usr/jdk64/jdk1.8.0_77/bin/java -Dproc_llapdaemon -Xms22251m -Xmx22251m -Dhttp.maxConnections=38 -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:TLABSize=8m -XX:+ResizeTLAB -XX:+UseNUMA -XX:+AggressiveOpts -XX:MetaspaceSize=1024m -XX:InitiatingHeapOccupancyPercent=80 -XX:MaxGCPauseMillis=200 -Xmx33251m -XX:MetaspaceSize=1024m -server -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails -verbose:gc -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M -XX:+PrintGCDateStamps -Xloggc:/grid/11/hadoop/yarn/log/application_1512480218177_0094/container_e119_1512480218177_0094_01_000002//gc_2017-12-05-16.log -Djava.io.tmpdir=/grid/6/hadoop/yarn/local/usercache/hive/appcache/application_1512480218177_0094/container_e119_1512480218177_0094_01_000002/tmp/ -Dlog4j.configurationFile=llap-daemon-log4j2.properties -Dllap.daemon.log.dir=/grid/11/hadoop/yarn/log/application_1512480218177_0094/container_e119_1512480218177_0094_01_000002/ -Dllap.daemon.log.file=llap-daemon-hive-ks-dmp12.kyivstar.ua.log -Dllap.daemon.root.logger=query-routing -Dllap.daemon.log.level=DEBUG -classpath /grid/6/hadoop/yarn/local/usercache/hive/appcache/application_1512480218177_0094/container_e119_1512480218177_0094_01_000002/app/install//conf/: /grid/6/hadoop/yarn/local/usercache/hive/appcache/application_1512480218177_0094/container_e119_1512480218177_0094_01_000002/app/install//lib/*: /grid/6/hadoop/yarn/local/usercache/hive/appcache/application_1512480218177_0094/container_e119_1512480218177_0094_01_000002/app/install//lib/tez/*: /grid/6/hadoop/yarn/local/usercache/hive/appcache/application_1512480218177_0094/container_e119_1512480218177_0094_01_000002/app/install//lib/udfs/*:.: org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon |- 15416 15414 15416 15416 (bash) 0 0 118026240 370 /bin/bash -c python ./infra/agent/slider-agent/agent/main.py --label container_e119_1512480218177_0094_01_000002___LLAP --zk-quorum ks-dmp03.kyivstar.ua:2181,ks-dmp01.kyivstar.ua:2181,ks-dmp02.kyivstar.ua:2181 --zk-reg-path /registry/users/hive/services/org-apache-slider/llap0 > /grid/11/hadoop/yarn/log/application_1512480218177_0094/container_e119_1512480218177_0094_01_000002/slider-agent. out 2>&1 |- 15427 15416 15416 15416 (python) 272 49 459268096 4579 python ./infra/agent/slider-agent/agent/main.py --label container_e119_1512480218177_0094_01_000002___LLAP --zk-quorum ks-dmp03.kyivstar.ua:2181,ks-dmp01.kyivstar.ua:2181,ks-dmp02.kyivstar.ua:2181 --zk-reg-path /registry/users/hive/services/org-apache-slider/llap0 Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143.

Cloudera Employee

@Dmitro Vasilenko

In the error log avoe, it says memory issue: [pid=15416,containerID=container_e119_1512480218177_0094_01_000002] is running beyond physical memory limits. Current usage: 27.0 GB of 26 GB physical memory used;

I think the memoryh settings for llap daemon are beyond the physical available memory. Please check.

Contributor

Hi! How resolve error LLAP ?:

[pid=15416,containerID=container_e119_1512480218177_0094_01_000002] is running beyond physical memory limits. Current usage: 27.0 GB of 26 GB physical memory used; 36.1 GB of 54.6 GB virtual memory used. Killing container

Contributor

In the above log, it looks like there are two xmx values on the commandline: -Xmx22251m and -Xmx33251m. Do you know where the 2nd value comes from? Was one of them specified via args? I'm not sure which one would apply (it would be logged in jmx view of the LLAP daemon), but if the limit is 27Gb and the 2nd value applies then this is the reason for the container to exceed memory.

Contributor

Hi! all Do not understand why this happen : Error : is running beyond physical memory limits. Current usage: 21.3 GB of 21 GB physical memory used;

/usr/hdp/current/hive-server2-hive2/bin/hive --service llapstatus


"hostname" : "serv06.kyivstar.ua",
    "containerId" : "container_e122_1512988591180_3785_01_000030",
    "logUrl" : "http://serv06.kyivstar.ua:8042/node/containerlogs/container_e122_1512988591180_3785_01_000030/hive",
    "diagnostics" : "Container [pid=33120,containerID=container_e122_1512988591180_3785_01_000030] 
	is 	running beyond physical memory limits. Current usage: 21.3 GB of 21 GB physical memory used; 30.8 GB of 44.1 GB virtual memory used. Killing container.\nDump 
	of the process-tree for container_e122_1512988591180_3785_01_000030 :\n\t|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) 
	RSSMEM_USAGE(PAGES) FULL_CMD_LINE\n\t|- 33132 33120 33120 33120 (python) 30 6 458362880 4302 python ./infra/agent/slider-agent/agent/main.py --label 
	container_e122_1512988591180_3785_01_000030___LLAP --zk-quorum serv03.kyivstar.ua:2181,serv01.kyivstar.ua:2181,serv02.kyivstar.ua:2181 -
	-zk-reg-path /registry/users/hive/services/org-apache-slider/llap0 \n\t|- 33120 33118 33120 33120 (bash) 0 0 118026240 369 /bin/bash -c python 
	./infra/agent/slider-agent/agent/main.py --label container_e122_1512988591180_3785_01_000030___LLAP --zk-quorum serv03.kyivstar.ua:2181,
	serv01.kyivstar.ua:2181,serv02.kyivstar.ua:2181 --zk-reg-path /registry/users/hive/services/org-apache-slider/llap0 > 
	/grid/11/hadoop/yarn/log/application_1512988591180_3785/container_e122_1512988591180_3785_01_000030/slider-agent.out 2>&1  
	\n\t|- 33184 1 33120 33120 (java) 10487 1162 32513224704 5572712 /usr/jdk64/jdk1.8.0_77/bin/java -Dproc_llapdaemon -Xms18251m -Xmx18251m -Dhttp.maxConnections=19 
	-XX:+AlwaysPreTouch -XX:+UseG1GC -XX:TLABSize=8m -XX:+ResizeTLAB -XX:+UseNUMA -XX:+AggressiveOpts -XX:MetaspaceSize=1024m -XX:InitiatingHeapOccupancyPercent=90 
	-XX:MaxGCPauseMillis=200 -Xmx25251m -XX:MetaspaceSize=1024m -server -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails -verbose:gc 
	-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M -XX:+PrintGCDateStamps 
	-Xloggc:/grid/11/hadoop/yarn/log/application_1512988591180_3785/container_e122_1512988591180_3785_01_000030//gc_2017-12-13-17.log -Djava.io.tmpdir=/grid/5/hadoop/yarn/local/
	usercache/hive/appcache/application_1512988591180_3785/container_e122_1512988591180_3785_01_000030/tmp/ -Dlog4j.configurationFile=llap-daemon-log4j2.properties 
	-Dllap.daemon.log.dir=/grid/11/hadoop/yarn/log/application_1512988591180_3785/container_e122_1512988591180_3785_01_000030/ 
	-Dllap.daemon.log.file=llap-daemon-hive-serv06.kyivstar.ua.log -Dllap.daemon.root.logger=query-routing -Dllap.daemon.log.level=ERROR -classpath 
	/grid/5/hadoop/yarn/local/usercache/hive/appcache/application_1512988591180_3785/container_e122_1512988591180_3785_01_000030/app/install//conf/:
	/grid/5/hadoop/yarn/local/usercache/hive/appcache/application_1512988591180_3785/container_e122_1512988591180_3785_01_000030/app/install//lib/*:
	/grid/5/hadoop/yarn/local/usercache/hive/appcache/application_1512988591180_3785/container_e122_1512988591180_3785_01_000030/app/install//lib/tez/*:
	/grid/5/hadoop/yarn/local/usercache/hive/appcache/application_1512988591180_3785/container_e122_1512988591180_3785_01_000030/app/install//lib/udfs/*:.: 
	org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon \n\nContainer killed on request. Exit code is 143\nContainer exited with a non-zero exit code 143. \n",
    "yarnContainerExitStatus" : 0
  } ]
}




LLAP start command: /usr/hdp/current/hive-server2-hive2/bin/hive --service llap --slider-am-container-mb 1024 --size 21251m 
                    --cache 2094m --xmx 18251m --loglevel ERROR  --output /var/lib/ambari-agent/tmp/llap-slider2017-12-13_15-28-50 
					--slider-placement 4 --skiphadoopversion --skiphbasecp --instances 18 --logger query-routing --args " -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:TLABSize=8m 
					-XX:+ResizeTLAB -XX:+UseNUMA -XX:+AggressiveOpts -XX:MetaspaceSize=1024m -XX:InitiatingHeapOccupancyPercent=90 -XX:MaxGCPauseMillis=200 -Xmx25251m 
					-XX:MetaspaceSize=1024m"


Contributor
##### LDAP Memory ##########
HiveServer Interactive Heap Size    23251
Memory per Daemon                   21251  
LLAP Daemon Heap Size (MB)          18251
In-Memory Cache per Daemon          2094
LLAP Daemon Container Max Headroom  1024
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.