- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 01-04-2017 01:07 PM
Was an interesting issue faced last week. Putting here for bigger audience, might be helpful to others too.
PROBLEM
On one of the node, datanode and nodemanager were not coming up. Below is the error after starting from ambari.
resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start datanode'' returned 1. starting datanode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-datanode-ny-node3.hwxblr.com.out Error: Could not find or load main class org.apache.hadoop.hdfs.server.datanode.DataNode
As datanode process itself wasn't loaded, so nothing was printed in datanode logs. Only thing we see in .out file is
Error: Could not find or load main class org.apache.hadoop.hdfs.server.datanode.DataNode
Verified in the jar, DataNode class was present.
/usr/jdk64/jdk1.8.0_77/bin/jar -tvf /usr/hdp/2.5.0.0-1245/hadoop-hdfs/hadoop-hdfs-2.7.3.2.5.0.0-1245.jar | grep DataNode.class org/apache/hadoop/hdfs/server/datanode/DataNode.class
ROOT CAUSE
@nvadivelu came to rescue. We used below utility to figure out which class was missing.
public class Sample { public static void main(String[] args) { try { org.apache.hadoop.hdfs.server.datanode.DataNode.main(args); } catch (Throwable ex) { ex.printStackTrace(); } } }
We ran the above code, and it printed the exact class which wasn't able to load.
/usr/jdk64/jdk1.8.0_77/bin/javac -cp `hadoop classpath` Sample.java Sample.java:5: error: cannot access TraceAdminProtocol org.apache.hadoop.hdfs.server.datanode.DataNode.main(args); ^ class file for org.apache.hadoop.tracing.TraceAdminProtocol not found 1 error
TraceAdminProtocol clas is present hadoop-common jar. We grep this class in the hadoop-common jar, we didn't find. But on other host, where datanode was running fine, we got below results.
grep "TraceAdminProtocol" /usr/hdp/2.5.0.0-1245/hadoop/hadoop-common-2.7.3.2.5.0.0-1245.jar Binary file /usr/hdp/2.5.0.0-1245/hadoop/hadoop-common-2.7.3.2.5.0.0-1245.jar matches
Also we verified size of this jar was less compared to the working one.
RESOLUTION
We copied this jar from the working host and datanode and nodemanager came up fine. We had no clue, from where this jar came, even of same version. But it was a good learning experience.