Support Questions

Find answers, ask questions, and share your expertise

Debugging Apache Slider on CDH 5.3.0 Virtualbox VM

Explorer

I went through the Slider Memcached Tutorial and was able to package/deploy/start the memcached container successfully; however when I package up a custom application, basically a Java jar plus dependencies, the container never launches succssfully.

 

The application page show the app is in a FINISHED/FAILED state with this diagnostic:

http://quickstart.cloudera:8088/cluster/app/application_1439926335194_0001

 

Diagnostics: Unstable Application Instance : - failed with component MYAPP failed 'recently' 6 times (4 in startup); threshold is 5 - last failure: Failure container_1439926335194_0001_01_000008 on host quickstart.cloudera (0): http://quickstart.cloudera:19888/jobhistory/logs//quickstart.cloudera:8041/container_1439926335194_0...

 

 

Part of the challenge in diagnosing the issue with the container is that the logs disappear after the application completes.

http://quickstart.cloudera:8042/node/containerlogs/container_1439926335194_0001_01_000001/MYUSER

 

There is a troubleshooting page for slider which indicates that you can persist the logs beyond application completion:

http://slider.incubator.apache.org/docs/troubleshooting.html

 

Configuring YARN for better debugging

One configuration to aid debugging is tell the nodemanagers to keep data for a short period after containers finish

<!-- 10 minutes after a failure to see what is left in the directory-->
<property>
  <name>yarn.nodemanager.delete.debug-delay-sec</name>
  <value>600</value>
</property>

 

And I found this setting in Yarn - Configuration - NodeManager Base Group - Advanced - Localized Dir Delection Delay and changed it from the default of 0 to 1200; however even after I deploy client config, and restart Nodemanager + Yarn, even restart the VM, the logs are still getting deleted on container completion.

 

I'm working on the CDH 5.3.0 Vitrualbox VM image and the cluster + services appear to be working normally as I start up the package.

 

 

 

 

 

 

 

1 ACCEPTED SOLUTION

Explorer

I found the container logs via the containers web UI (on Cloudera VM it is http://quickstart.cloudera:8042/node/allContainers)

 
There are 2 containers for my application, first just shows the logs I was looking at earlier indicating whether the container succeeded or failed; second has many logs with useful info (command / errors / slider-agent / status_command).
 
They are transient, but I was able to look at them before the application terminated.
 
slider-agent.out just has this line in it:
 
No handlers could be found for logger "root"
 
However slider-agent.log gave me the info I was looking for, basically the stderr / stdout from executing the Java command line so that is very helpful.
 
INFO 2015-08-19 14:07:28,422 AgentToggleLogger.py:40 - Queue result: {'componentStatus': [],
 'reports': [{'actionId': u'4-1',
              'clusterName': u'myapp1',
              'exitcode': 1,
              'reportResult': True,
              'role': u'MYAPP',
              'roleCommand': u'START',
              'serviceName': u'myapp1',
              'status': 'FAILED',
              'stderr': '2015-08-19 14:07:28,268 - Error while executing command ...<removed for brevity>,
              'stdout': '2015-08-19 14:07:23,261 - Execute[\'/usr/java/latest/bin/java -Xmx256m -classpath ...<removed for brevity>,
              'structuredOut': '{}',
              'taskId': 4}]}
Locating the container logs put me on the path to solving this.

View solution in original post

3 REPLIES 3

Explorer

Only error in the log I see is this:

 

Role instance RoleInstance failed 

 

2015-08-19 10:59:21,819 [AMRM Callback Handler Thread] ERROR appmaster.SliderAppMaster - Role instance RoleInstance{role='MYAPP', id='container_1439926335194_0002_01_000003', container=ContainerID=container_1439926335194_0002_01_000003 nodeID=quickstart.cloudera:8041 http=quickstart.cloudera:8042 priority=1073741825 resource=<memory:1024, vCores:1>, createTime=1440007115649, startTime=1440007115674, released=false, roleId=1, host=quickstart.cloudera, hostURL=http://quickstart.cloudera:8042, state=5, placement=null, exitCode=0, command='python ./infra/agent/slider-agent/agent/main.py --label container_1439926335194_0002_01_000003___MYAPP --zk-quorum localhost:2181 --zk-reg-path /registry/users/myuser/services/org-apache-slider/myapp1> /slider-agent.out 2>&1 ; ', diagnostics='', output=null, environment=[LANGUAGE="en_US.UTF-8", AGENT_WORK_ROOT="$PWD", HADOOP_USER_NAME="C4", AGENT_LOG_ROOT="", PYTHONPATH="./infra/agent/slider-agent/", LC_ALL="en_US.UTF-8", SLIDER_PASSPHRASE="<redacted>", LANG="en_US.UTF-8"]} failed

Explorer

I found the container logs via the containers web UI (on Cloudera VM it is http://quickstart.cloudera:8042/node/allContainers)

 
There are 2 containers for my application, first just shows the logs I was looking at earlier indicating whether the container succeeded or failed; second has many logs with useful info (command / errors / slider-agent / status_command).
 
They are transient, but I was able to look at them before the application terminated.
 
slider-agent.out just has this line in it:
 
No handlers could be found for logger "root"
 
However slider-agent.log gave me the info I was looking for, basically the stderr / stdout from executing the Java command line so that is very helpful.
 
INFO 2015-08-19 14:07:28,422 AgentToggleLogger.py:40 - Queue result: {'componentStatus': [],
 'reports': [{'actionId': u'4-1',
              'clusterName': u'myapp1',
              'exitcode': 1,
              'reportResult': True,
              'role': u'MYAPP',
              'roleCommand': u'START',
              'serviceName': u'myapp1',
              'status': 'FAILED',
              'stderr': '2015-08-19 14:07:28,268 - Error while executing command ...<removed for brevity>,
              'stdout': '2015-08-19 14:07:23,261 - Execute[\'/usr/java/latest/bin/java -Xmx256m -classpath ...<removed for brevity>,
              'structuredOut': '{}',
              'taskId': 4}]}
Locating the container logs put me on the path to solving this.

Community Manager

Thank you for sharing the steps you took dr3x. Hopefully it will help others who face a similar issue in the future. 


Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.