Support Questions

Find answers, ask questions, and share your expertise

NameNode keeps going down

avatar

Hi all,

I am having a problem with the NameNode status ambari shows. The following points are verifiable in the system: - The NameNode keeps going down a few seconds after I start it through ambari (it looks like it never really goes up, but the start process run successfully);

- Despite being DOWN according to ambari, if I run JPS in the server the NameNode is hosted it shows that the service is running:

[hdfs@RHTPINEC008 ~]$ jps
39395 NameNode
4463 Jps

and I can access NameNode UI properly;

- I already restarted both the namenode and ambari-agent the manually but the behavior keeps the same;

- This problem started after some HBase/Phoenix heavy queries that caused the namenode to go down (not sure if this is actually related but the exact same configurations were working well before this episode);

- I've been digging for some hours and I am not being able to find error details in the namenode logs nor in the ambari-agent logs that allows me to understand the problem;

I am using hdp 2.4.0 and no HA options.

Can someone help in this?

Thanks in advance

28 REPLIES 28

avatar
Master Mentor

@Subramanian Govindasamy

Tha means the HDFS is down can you start it from Ambari UI or CLI?

avatar

@Geoffrey Shelton Okot I am able to start it from both , but status getting changed to "INSTALLED" immediately after startup, but in the server i am able to see name node and data node running, but ambari console shows down.

avatar
Master Mentor

@Subramanian Govindasamy

Can you check the /etc/hosts entries on all the nodes ? and do the following with the ambari-agents on the affected node move the move /var/lib/ambari-agent/data/structured-out-status.json to /tmp and restart the ambari-agent.

# ambari-agent restart

Do you see any error/exception in the /var/log/ambari-server/ambari-server.log?

avatar
Master Mentor

avatar

@Geoffrey Shelton Okot

sorry for the delay in replying. same error.

INFO [ambari-heartbeat-processor-0] ServiceComponentHostImpl:1039 - Ho
st role transitioned to a new state, serviceComponentName=NAMENODE, hostName=node1.test
.co, oldState=STARTING, currentState=INSTALLED

since users are available in AD,do i need to map to local. could you please guide me here?

RULE:[1:$1@$0](shdfs@test.co)s/.*/hdfs/

    "message": "Invalid value for webhdfs parameter \"user.name\": Invalid value: \"shdfs@test.co\" does not belong to the domain ^[A-Za-z_][A-Za-z0-9._-]*[$]?$"

  }

avatar
Master Mentor

@Subramanian Govindasamy

I don't know the pattern you want to be translated but here is a great HCC reference you should be okay with this

Please revert

avatar

Geoffrey Shelton Okot

AD user is shdfs@Test.co .Could you please let me know the which format the rule should to start the web hdfs?

avatar

dfs.webhdfs.user.provider.user.pattern

avatar

@Geoffrey Shelton Okot

I struck here for long. Appreciate your help!

avatar
Master Mentor

@Subramanian Govindasamy

Did you set a one way trust MIT KDC to Active Directory if so can you share your /etc/krb5.conf entry