Reply
Highlighted
New Contributor
Posts: 4
Registered: ‎04-03-2019

The DataNode is not connected to one or more of its NameNode(s)

Hi,

 

I have 10 datanodes in my cluster and 1 of the datanodes health is bad. I see the below error for this node.

 

'This host is in contact with the Cloudera Manager Server. This host is not in contact with the Host Monitor.' 

 

I have restarted the scm agent but it still has the same error. Below are the error messages in the clouders-scm-agent.log file. Can someone please help me to fix this. 

 

[04/Apr/2019 12:52:26 +0000] 122842 MainThread agent        ERROR    Failed to configure inotify. Parcel repository will not auto-refresh.

Traceback (most recent call last):

  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 813, in _init_after_first_heartbeat_response

    self.inotify = self.repo.configure_inotify()

  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/parcel.py", line 415, in configure_inotify

    wm = pyinotify.WatchManager()

  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/pyinotify-0.9.3-py2.7.egg/pyinotify.py", line 1706, in __init__

    raise OSError(err % self._inotify_wrapper.str_errno())

OSError: Cannot initialize new instance of inotify, Errno=Too many open files (EMFILE)

[04/Apr/2019 12:52:26 +0000] 122842 MainThread parcel_cache INFO     Using /opt/cloudera/parcel-cache for parcel cache

[04/Apr/2019 12:52:26 +0000] 122842 MainThread agent        ERROR    Caught unexpected exception in main loop.

Traceback (most recent call last):

  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 710, in start

    self._init_after_first_heartbeat_response(resp_data)

  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 840, in _init_after_first_heartbeat_response

    self.client_configs.load()

  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/client_configs.py", line 682, in load

    new_deployed.update(self._lookup_alternatives(fname))

  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/client_configs.py", line 432, in _lookup_alternatives

    return self._parse_alternatives(alt_name, out)

  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/client_configs.py", line 444, in _parse_alternatives

    path, _, _, priority_str = line.rstrip().split(" ")

ValueError: too many values to unpack

 

Thanks

Expert Contributor
Posts: 108
Registered: ‎02-23-2018

Re: The DataNode is not connected to one or more of its NameNode(s)

Hi @Subba,

 

Have you try to restart all components?

 

See this solution:

https://community.cloudera.com/t5/Cloudera-Manager-Installation/This-host-is-not-in-contact-with-the...

 

 

Regards,

Manu.

Cloudera Employee
Posts: 232
Registered: ‎01-15-2015

Re: The DataNode is not connected to one or more of its NameNode(s)

The relevant error is

OSError: Cannot initialize new instance of inotify, Errno=Too many open files (EMFILE)

Have you tried to hard restart the CM agent? Do you see those errors straight after startup?

Can you count the number of files in /var/log/spark/lineage directory, there is a known issue when there is a huge number of files in there. 

New Contributor
Posts: 4
Registered: ‎04-03-2019

Re: The DataNode is not connected to one or more of its NameNode(s)

Hi,

 

Thanks for the reply. I dont find spark folder in /var/log. I'm getting this issue evan after restarting the scm-agent.

 

 

 File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 710, in start

    self._init_after_first_heartbeat_response(resp_data)

  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 840, in _init_after_first_heartbeat_response

    self.client_configs.load()

  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/client_configs.py", line 682, in load

    new_deployed.update(self._lookup_alternatives(fname))

  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/client_configs.py", line 432, in _lookup_alternatives

    return self._parse_alternatives(alt_name, out)

  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/client_configs.py", line 444, in _parse_alternatives

    path, _, _, priority_str = line.rstrip().split(" ")

ValueError: too many values to unpack

 

 

I tried to see the jdk versions. But same jdk is present in all other servers. Not sure why am having issue only with this server.

 

[root@ip-10-0-1-32 ~]# rpm -qa "*jdk*"

java-1.8.0-openjdk-devel-1.8.0.144-0.b01.el7_4.x86_64

java-1.8.0-openjdk-headless-1.8.0.144-0.b01.el7_4.x86_64

jdk1.8.0_102-1.8.0_102-fcs.x86_64

copy-jdk-configs-2.2-3.el7.noarch

java-1.8.0-openjdk-1.8.0.144-0.b01.el7_4.x86_64

 

 

Thanks

Cloudera Employee
Posts: 232
Registered: ‎01-15-2015

Re: The DataNode is not connected to one or more of its NameNode(s)

The issue is caused by a new option introduced to alternatives in this version which causes the alternatives parser to fail.

 

For resolution the options are to either

  1. Remove all OpenJDK packages from this host, they are not needed for CM / CDH 
  2. Apply the proposed workaround from this thread post (which you already posted to)
  3. Upgrade Cloudera Manager to 5.12.1 or later, it has code changes to deal with this situation