Reply
Highlighted
Explorer
Posts: 17
Registered: ‎09-09-2014

Multiple services down - Yarn - Directory is in an inconsistent state

[ Edited ]

Dear Friends

We need your help. We have recently updated our domain name/ip in our Kerberos Active directory authentication setting in CM. Now, we got the following health issues and we cannot start our cluster and CM service. Any help much appreciated!

We have two name nodes, one added with High availability.

Thanks much in advance and please let me know if you have any question.

 

Kind regards

Andy

 

For Yarn, Job history server, here is the error:

This role's process exited. This role is supposed to be started.

 

Failed to start namenode.

org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data/1/dfs/nn is in an inconsistent state: storage directory does not exist or is not accessible.

                at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:313)

 

 

Here are the list of health issues:

 cluster

  HBase

HBase Master Health

  catalogserver (name_node)

StateStore Connectivity

  impalad (first_data_node)

StateStore Connectivity, Impala Daemon Ready Check., Web Server Status

  impalad (2nd_data_node)

StateStore Connectivity, Impala Daemon Ready Check., Unexpected Exits, Web Server Status

  impalad (3rd_data_node)

StateStore Connectivity, Impala Daemon Ready Check., Unexpected Exits, Web Server Status

  jobhistory (name_node)

Process Status

  master (name_node)

Process Status

  oozie_server (name_node)

Web Server Status

 

----

So, here are the errors for other services in details:

 

  1. A) Hdfs also has these two health issues:

 

1) NameNode summary: <name_node_name> (Availability: Standby, Health: Good), <2nd name node> (Availability: Stopped, Health: Bad).

This health test is bad because the Service Monitor did not find an active NameNode.

 

2) Details Canary test failed to create parent directory for /tmp/.cloudera_health_monitoring_canary_files.

 

  1. B) Oozie error:

The Cloudera Manager Agent is not able to communicate with this role's web server.

log entry:

ERROR  org.apache.oozie.servlet.V0AdminServlet          

SERVER[<name_node>] USER[hue] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] URL[GET http://<name_node>:11000/oozie/v0/admin/instrumentation] error, null java.lang.UnsupportedOperationException

 

  1. C) HBase 2 errors:

 

1) HBase Master Health

Master summary: <name_node> (Availability: Unknown, Health: Bad). This health test is bad because the Service Monitor did not find an active Master.

 

2) master (<short_name_node>)

Process Status

This role's process exited. This role is supposed to be started.

ERROR  org.apache.hadoop.hbase.master.HMasterCommandLine          

Master exiting

java.lang.RuntimeException: HMaster Aborted

 

------------

 

And here is the CM health issue with the error detail, thank you.

 

The Reports Manager is not running.

This role's status is as expected. The role is stopped.

 

WARN   org.hibernate.engine.jdbc.spi.SqlExceptionHelper          

SQL Error: 0, SQLState: null

3:56:25.884 PM ERROR  org.hibernate.engine.jdbc.spi.SqlExceptionHelper          

Connections could not be acquired from the underlying database!

3:56:25.884 PM WARN   com.mchange.v2.resourcepool.BasicResourcePool         

com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@10ABCD -- Acquisition Attempt Failed!!! Clearing pending acquires.

While trying to acquire a needed new resource, we failed to succeed more than the maximum number of allowed acquisition attempts (5). Last acquisition attempt exception:

 

ERROR  com.cloudera.headlamp.HeadlampServer          

Unable to upgrade schema to latest version.

org.hibernate.exception.GenericJDBCException: Could not open connection

Announcements