Dear Friends
We need your help. We have recently updated our domain name/ip in our Kerberos Active directory authentication setting in CM. Now, we got the following health issues and we cannot start our cluster and CM service. Any help much appreciated!
We have two name nodes, one added with High availability.
Thanks much in advance and please let me know if you have any question.
Kind regards
Andy
For Yarn, Job history server, here is the error:
This role's process exited. This role is supposed to be started.
Failed to start namenode.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data/1/dfs/nn is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:313)
Here are the list of health issues:
cluster
HBase
HBase Master Health
catalogserver (name_node)
StateStore Connectivity
impalad (first_data_node)
StateStore Connectivity, Impala Daemon Ready Check., Web Server Status
impalad (2nd_data_node)
StateStore Connectivity, Impala Daemon Ready Check., Unexpected Exits, Web Server Status
impalad (3rd_data_node)
StateStore Connectivity, Impala Daemon Ready Check., Unexpected Exits, Web Server Status
jobhistory (name_node)
Process Status
master (name_node)
Process Status
oozie_server (name_node)
Web Server Status
----
So, here are the errors for other services in details:
- A) Hdfs also has these two health issues:
1) NameNode summary: <name_node_name> (Availability: Standby, Health: Good), <2nd name node> (Availability: Stopped, Health: Bad).
This health test is bad because the Service Monitor did not find an active NameNode.
2) Details Canary test failed to create parent directory for /tmp/.cloudera_health_monitoring_canary_files.
- B) Oozie error:
The Cloudera Manager Agent is not able to communicate with this role's web server.
log entry:
ERROR org.apache.oozie.servlet.V0AdminServlet
SERVER[<name_node>] USER[hue] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] URL[GET http://<name_node>:11000/oozie/v0/admin/instrumentation] error, null java.lang.UnsupportedOperationException
- C) HBase 2 errors:
1) HBase Master Health
Master summary: <name_node> (Availability: Unknown, Health: Bad). This health test is bad because the Service Monitor did not find an active Master.
2) master (<short_name_node>)
Process Status
This role's process exited. This role is supposed to be started.
ERROR org.apache.hadoop.hbase.master.HMasterCommandLine
Master exiting
java.lang.RuntimeException: HMaster Aborted
------------
And here is the CM health issue with the error detail, thank you.
The Reports Manager is not running.
This role's status is as expected. The role is stopped.
WARN org.hibernate.engine.jdbc.spi.SqlExceptionHelper
SQL Error: 0, SQLState: null
3:56:25.884 PM ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper
Connections could not be acquired from the underlying database!
3:56:25.884 PM WARN com.mchange.v2.resourcepool.BasicResourcePool
com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@10ABCD -- Acquisition Attempt Failed!!! Clearing pending acquires.
While trying to acquire a needed new resource, we failed to succeed more than the maximum number of allowed acquisition attempts (5). Last acquisition attempt exception:
ERROR com.cloudera.headlamp.HeadlampServer
Unable to upgrade schema to latest version.
org.hibernate.exception.GenericJDBCException: Could not open connection