Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Expert Contributor

Was an interesting issue faced last week. Putting here for bigger audience, might be helpful to others too.

PROBLEM

On one of the node, datanode and nodemanager were not coming up. Below is the error after starting from ambari.

resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ;  /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start datanode'' returned 1. starting datanode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-datanode-ny-node3.hwxblr.com.out
Error: Could not find or load main class org.apache.hadoop.hdfs.server.datanode.DataNode

As datanode process itself wasn't loaded, so nothing was printed in datanode logs. Only thing we see in .out file is

Error: Could not find or load main class org.apache.hadoop.hdfs.server.datanode.DataNode

Verified in the jar, DataNode class was present.

/usr/jdk64/jdk1.8.0_77/bin/jar  -tvf /usr/hdp/2.5.0.0-1245/hadoop-hdfs/hadoop-hdfs-2.7.3.2.5.0.0-1245.jar | grep DataNode.class 

org/apache/hadoop/hdfs/server/datanode/DataNode.class

ROOT CAUSE

@nvadivelu came to rescue. We used below utility to figure out which class was missing.

public class Sample {
public static void main(String[] args) {
try {
org.apache.hadoop.hdfs.server.datanode.DataNode.main(args);
} catch (Throwable ex) {
ex.printStackTrace();
}
}
}

We ran the above code, and it printed the exact class which wasn't able to load.

/usr/jdk64/jdk1.8.0_77/bin/javac -cp `hadoop classpath` Sample.java
Sample.java:5: error: cannot access TraceAdminProtocol
org.apache.hadoop.hdfs.server.datanode.DataNode.main(args);
                                               ^
  class file for org.apache.hadoop.tracing.TraceAdminProtocol not found
1 error

TraceAdminProtocol clas is present hadoop-common jar. We grep this class in the hadoop-common jar, we didn't find. But on other host, where datanode was running fine, we got below results.

grep "TraceAdminProtocol" /usr/hdp/2.5.0.0-1245/hadoop/hadoop-common-2.7.3.2.5.0.0-1245.jar 

Binary file /usr/hdp/2.5.0.0-1245/hadoop/hadoop-common-2.7.3.2.5.0.0-1245.jar matches

Also we verified size of this jar was less compared to the working one.

RESOLUTION

We copied this jar from the working host and datanode and nodemanager came up fine. We had no clue, from where this jar came, even of same version. But it was a good learning experience.

7,381 Views