Created on 02-10-2016 11:12 AM - edited 08-17-2019 01:17 PM
I just completed my first Express upgrade (EU) using Ambari-2.2.0, from HDP-2.2.8 to HDP-2.3.4 and here are my observations and issues I encountered. The cluster has 12 nodes, 2 masters and 10
workers with configured Namenode HA and RM HA running on RHEL-6.5 using Java-7. Installed Hadoop components: HDFS, MR2, Yarn, Hive, Tez, HBase, Pig, Sqoop, Oozie, ZooKeeper, and AmbariMetrics. About 2 weeks before this EU, the cluster was upgraded from HDP-2.1.10 and Ambari-1.7.1. Please use this as a reference: based on cluster settings and previous history (previous upgrade or fresh install), the issues will differ, and the problems I had should by no means considered to be representative, and taking place during every EU.
2016-02-05 13:16:52,503 FATAL nodemanager.NodeManager (NodeManager.java:initAndStartNodeManager(540)) - Error starting NodeManagerorg.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 2 missing files; e.g.: /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/000035.sst
And indeed, in that directory I had 000040.sst but sadly no 000035.sst. I realized that it is my yarn.nodemanager.recovery.dir and because my Yarn NM recovery was enabled, NM tried to recover its state to the one before it was stopped. All our jobs were stopped and we didn't mind about recovering NM states, so after backing up the directory I decided to delete all files in it, and try to start NM manually. Luckily, that worked! The command to start a NM manually, as done by Ambari, as yarn user:
$ ulimit -c unlimited; export HADOOP_LIBEXEC_DIR=/usr/hdp/current/hadoop-client/libexec && /usr/hdp/current/hadoop-yarn-nodemanager/sbin/yarn-daemon.sh --config /usr/hdp/current/hadoop-client/conf start nodemanager
$ ulimit -c unlimited; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start namenode -rollingUpgrade started
$ hdfs dfs -chmod -R 755 /user/oozie/share/lib_20160205182129
JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64 HADOOP_HOME=${HADOOP_HOME:-/usr} if [ -d "/usr/lib/tez" ]; then PIG_OPTS="$PIG_OPTS -Dmapreduce.framework.name=yarn" fi
templeton.libjars=/usr/hdp/${hdp.version}/zookeeper/zookeeper.jar,/usr/hdp/${hdp.version}/hive/lib/hive-common.jar