Support Questions

Find answers, ask questions, and share your expertise

Debugging Apache Slider on CDH 5.3.0 Virtualbox VM

avatar
Contributor

I went through the Slider Memcached Tutorial and was able to package/deploy/start the memcached container successfully; however when I package up a custom application, basically a Java jar plus dependencies, the container never launches succssfully.

 

The application page show the app is in a FINISHED/FAILED state with this diagnostic:

http://quickstart.cloudera:8088/cluster/app/application_1439926335194_0001

 

Diagnostics: Unstable Application Instance : - failed with component MYAPP failed 'recently' 6 times (4 in startup); threshold is 5 - last failure: Failure container_1439926335194_0001_01_000008 on host quickstart.cloudera (0): http://quickstart.cloudera:19888/jobhistory/logs//quickstart.cloudera:8041/container_1439926335194_0...

 

 

Part of the challenge in diagnosing the issue with the container is that the logs disappear after the application completes.

http://quickstart.cloudera:8042/node/containerlogs/container_1439926335194_0001_01_000001/MYUSER

 

There is a troubleshooting page for slider which indicates that you can persist the logs beyond application completion:

http://slider.incubator.apache.org/docs/troubleshooting.html

 

Configuring YARN for better debugging

One configuration to aid debugging is tell the nodemanagers to keep data for a short period after containers finish

<!-- 10 minutes after a failure to see what is left in the directory-->
<property>
  <name>yarn.nodemanager.delete.debug-delay-sec</name>
  <value>600</value>
</property>

 

And I found this setting in Yarn - Configuration - NodeManager Base Group - Advanced - Localized Dir Delection Delay and changed it from the default of 0 to 1200; however even after I deploy client config, and restart Nodemanager + Yarn, even restart the VM, the logs are still getting deleted on container completion.

 

I'm working on the CDH 5.3.0 Vitrualbox VM image and the cluster + services appear to be working normally as I start up the package.

 

 

 

 

 

 

 

1 ACCEPTED SOLUTION

avatar
Contributor

I found the container logs via the containers web UI (on Cloudera VM it is http://quickstart.cloudera:8042/node/allContainers)

 
There are 2 containers for my application, first just shows the logs I was looking at earlier indicating whether the container succeeded or failed; second has many logs with useful info (command / errors / slider-agent / status_command).
 
They are transient, but I was able to look at them before the application terminated.
 
slider-agent.out just has this line in it:
 
No handlers could be found for logger "root"
 
However slider-agent.log gave me the info I was looking for, basically the stderr / stdout from executing the Java command line so that is very helpful.
 
INFO 2015-08-19 14:07:28,422 AgentToggleLogger.py:40 - Queue result: {'componentStatus': [],
 'reports': [{'actionId': u'4-1',
              'clusterName': u'myapp1',
              'exitcode': 1,
              'reportResult': True,
              'role': u'MYAPP',
              'roleCommand': u'START',
              'serviceName': u'myapp1',
              'status': 'FAILED',
              'stderr': '2015-08-19 14:07:28,268 - Error while executing command ...<removed for brevity>,
              'stdout': '2015-08-19 14:07:23,261 - Execute[\'/usr/java/latest/bin/java -Xmx256m -classpath ...<removed for brevity>,
              'structuredOut': '{}',
              'taskId': 4}]}
Locating the container logs put me on the path to solving this.

View solution in original post

3 REPLIES 3

avatar
Contributor

Only error in the log I see is this:

 

Role instance RoleInstance failed 

 

2015-08-19 10:59:21,819 [AMRM Callback Handler Thread] ERROR appmaster.SliderAppMaster - Role instance RoleInstance{role='MYAPP', id='container_1439926335194_0002_01_000003', container=ContainerID=container_1439926335194_0002_01_000003 nodeID=quickstart.cloudera:8041 http=quickstart.cloudera:8042 priority=1073741825 resource=<memory:1024, vCores:1>, createTime=1440007115649, startTime=1440007115674, released=false, roleId=1, host=quickstart.cloudera, hostURL=http://quickstart.cloudera:8042, state=5, placement=null, exitCode=0, command='python ./infra/agent/slider-agent/agent/main.py --label container_1439926335194_0002_01_000003___MYAPP --zk-quorum localhost:2181 --zk-reg-path /registry/users/myuser/services/org-apache-slider/myapp1> /slider-agent.out 2>&1 ; ', diagnostics='', output=null, environment=[LANGUAGE="en_US.UTF-8", AGENT_WORK_ROOT="$PWD", HADOOP_USER_NAME="C4", AGENT_LOG_ROOT="", PYTHONPATH="./infra/agent/slider-agent/", LC_ALL="en_US.UTF-8", SLIDER_PASSPHRASE="<redacted>", LANG="en_US.UTF-8"]} failed

avatar
Contributor

I found the container logs via the containers web UI (on Cloudera VM it is http://quickstart.cloudera:8042/node/allContainers)

 
There are 2 containers for my application, first just shows the logs I was looking at earlier indicating whether the container succeeded or failed; second has many logs with useful info (command / errors / slider-agent / status_command).
 
They are transient, but I was able to look at them before the application terminated.
 
slider-agent.out just has this line in it:
 
No handlers could be found for logger "root"
 
However slider-agent.log gave me the info I was looking for, basically the stderr / stdout from executing the Java command line so that is very helpful.
 
INFO 2015-08-19 14:07:28,422 AgentToggleLogger.py:40 - Queue result: {'componentStatus': [],
 'reports': [{'actionId': u'4-1',
              'clusterName': u'myapp1',
              'exitcode': 1,
              'reportResult': True,
              'role': u'MYAPP',
              'roleCommand': u'START',
              'serviceName': u'myapp1',
              'status': 'FAILED',
              'stderr': '2015-08-19 14:07:28,268 - Error while executing command ...<removed for brevity>,
              'stdout': '2015-08-19 14:07:23,261 - Execute[\'/usr/java/latest/bin/java -Xmx256m -classpath ...<removed for brevity>,
              'structuredOut': '{}',
              'taskId': 4}]}
Locating the container logs put me on the path to solving this.

avatar
Community Manager

Thank you for sharing the steps you took dr3x. Hopefully it will help others who face a similar issue in the future. 


Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.