Community Articles
Find and share helpful community-sourced technical articles
Labels (1)
Rising Star

Was an interesting issue faced last week. Putting here for bigger audience, might be helpful to others too.


On one of the node, datanode and nodemanager were not coming up. Below is the error after starting from ambari.

resource_management.core.exceptions.Fail: Execution of ' su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ;  /usr/hdp/current/hadoop-client/sbin/ --config /usr/hdp/current/hadoop-client/conf start datanode'' returned 1. starting datanode, logging to /var/log/hadoop/hdfs/
Error: Could not find or load main class org.apache.hadoop.hdfs.server.datanode.DataNode

As datanode process itself wasn't loaded, so nothing was printed in datanode logs. Only thing we see in .out file is

Error: Could not find or load main class org.apache.hadoop.hdfs.server.datanode.DataNode

Verified in the jar, DataNode class was present.

/usr/jdk64/jdk1.8.0_77/bin/jar  -tvf /usr/hdp/ | grep DataNode.class 



@nvadivelu came to rescue. We used below utility to figure out which class was missing.

public class Sample {
public static void main(String[] args) {
try {
} catch (Throwable ex) {

We ran the above code, and it printed the exact class which wasn't able to load.

/usr/jdk64/jdk1.8.0_77/bin/javac -cp `hadoop classpath` error: cannot access TraceAdminProtocol
  class file for org.apache.hadoop.tracing.TraceAdminProtocol not found
1 error

TraceAdminProtocol clas is present hadoop-common jar. We grep this class in the hadoop-common jar, we didn't find. But on other host, where datanode was running fine, we got below results.

grep "TraceAdminProtocol" /usr/hdp/ 

Binary file /usr/hdp/ matches

Also we verified size of this jar was less compared to the working one.


We copied this jar from the working host and datanode and nodemanager came up fine. We had no clue, from where this jar came, even of same version. But it was a good learning experience.