<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Agents unable to contact Host-Monitor for avro schema errors in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Agents-unable-to-contact-Host-Monitor-for-avro-schema-errors/m-p/46149#M40061</link>
    <description>&lt;P&gt;ammolitor,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Cloudera has fixed this Cloudera Manager/Agent bug (Jira OPSAPS-35742) and the fix will be in the next possible releases of 5.5.x and up.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For now, the workaround is to remove the device that is large as the agent code will look at the device regardless of the exclusions. &amp;nbsp;You can still give it a try, though.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;parnigot, this seems to be a new manifestation of the same problem we saw with large file system size. &amp;nbsp;I'll open a new Jira for this as I don't think we have gotten a report of this at the interface level before. &amp;nbsp;Great find on the workaround, too. &amp;nbsp;Glad that works for the interface.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Ben&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 11 Oct 2016 15:41:22 GMT</pubDate>
    <dc:creator>bgooley</dc:creator>
    <dc:date>2016-10-11T15:41:22Z</dc:date>
    <item>
      <title>Agents unable to contact Host-Monitor for avro schema errors</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Agents-unable-to-contact-Host-Monitor-for-avro-schema-errors/m-p/44886#M40057</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm having an issue with multiple cloudera manager agents on a CDH 5.2 cluster.&lt;BR /&gt;The error we are seing on the CM web-interface is a generic one: &lt;EM&gt;This host is in contact with Cloudera Manager. The host's Cloudera Manager Agent's software version can not be determined&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;The issue is not permanent and randomly comes and goes.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In the log files (/var/log/cloudera-scm-agent/cloudera-scm.agent.log) the daemon prints a lot of these messages:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;[04/Sep/2016 17:40:54 +0000] 6757 MonitorDaemon-Reporter throttling_logger ERROR    (9 skipped) Error sending messages to firehose: mgmt-HOSTMONITOR-398eb4f15a6b55c56ba3c74ad84d8633
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/src/cmf/monitor/firehose.py", line 75, in _send
    self._requestor.request('sendAgentMessages', dict(messages=messages))
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 135, in request
    self.write_call_request(message_name, request_datum, buffer_encoder)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 173, in write_call_request
    self.write_request(message.request, request_datum, encoder)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 177, in write_request
    datum_writer.write(request_datum, encoder)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/io.py", line 768, in write
    raise AvroTypeException(self.writers_schema, datum)

    ... formatted python dictionary ...

    is not an example of the schema [... whole avro schema...]&lt;/PRE&gt;&lt;P&gt;From my understanding the agent fails to serialize the data collected from the host to avro and can't update the host monitor.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;On these machines I also have a know-issue on the reported speed on the NICs: according to the OS i have 10Gbits interfaces that are doing hundreds of GiB/s (an obvious bug in the OS itself or on the NIC's fw/driver).&lt;/P&gt;&lt;P&gt;Using the data from the &lt;EM&gt;"Last HMON status"&lt;/EM&gt; from the agent's web-ui I've discoverd this "strange" coincidence: when an agent is experiencing the issue there's at least one NIC with a metric value &amp;gt; than the max long value:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;{'iface': 'bond0',
 'metrics': [ ....
             {'id': 11130,
              'value': 9435474096333102400L},
              ... ],
}&lt;/PRE&gt;&lt;P&gt;I don't know exaclty what these metric measure, maybe the bytes sent/received in the last minute? Netherless these numbers, compared with other host w/o the nic issue, are exceptionally high.&lt;/P&gt;&lt;P&gt;Now, in python this isn't a problem because you basically can't overflow an int/long, but maybe the error above happens when the agent can't convert this very big number to a 64bit long in avro (9435474096333102400 is bigger than 9223372036854775807 = 2^63-1). I'm not sure about this because I can't really understand the avro schema and I don't know if the expected type for &lt;EM&gt;value&lt;/EM&gt; is a long.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What do you guys think? Has someone experienced anything like this?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And bonus question: Is it possible to blacklist the bugged network interfaces from the agent statistics?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;p&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 10:38:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Agents-unable-to-contact-Host-Monitor-for-avro-schema-errors/m-p/44886#M40057</guid>
      <dc:creator>parnigot</dc:creator>
      <dc:date>2022-09-16T10:38:23Z</dc:date>
    </item>
    <item>
      <title>Re: Agents unable to contact Host-Monitor for avro schema errors</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Agents-unable-to-contact-Host-Monitor-for-avro-schema-errors/m-p/46128#M40058</link>
      <description>Seeing similar agent/python error with a large filesystem mounted. any workarounds found yet?</description>
      <pubDate>Tue, 11 Oct 2016 05:00:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Agents-unable-to-contact-Host-Monitor-for-avro-schema-errors/m-p/46128#M40058</guid>
      <dc:creator>ammolitor</dc:creator>
      <dc:date>2016-10-11T05:00:04Z</dc:date>
    </item>
    <item>
      <title>Re: Agents unable to contact Host-Monitor for avro schema errors</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Agents-unable-to-contact-Host-Monitor-for-avro-schema-errors/m-p/46132#M40059</link>
      <description>&lt;P&gt;In the end we managed to solve this excluding the problematic network interface from the agent monitoring.&lt;/P&gt;&lt;P&gt;Cloudera Manager indeed has an option to do than in the hosts configuration section. For the nic it's called &lt;EM&gt;Network Interface Collection Exclusion Regex&lt;/EM&gt; (by default only the loopback interface is excluded).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/16403"&gt;@ammolitor﻿&lt;/a&gt;&lt;/P&gt;&lt;P&gt;For the disks there are two options: &lt;EM&gt;Disk Device Collection Exclusion Regex&lt;/EM&gt; and &lt;EM&gt;Filesystem Collection Exclusion Regex.&lt;BR /&gt;&lt;/EM&gt;Maybe one of these does the trick for you...&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 11 Oct 2016 07:00:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Agents-unable-to-contact-Host-Monitor-for-avro-schema-errors/m-p/46132#M40059</guid>
      <dc:creator>parnigot</dc:creator>
      <dc:date>2016-10-11T07:00:48Z</dc:date>
    </item>
    <item>
      <title>Re: Agents unable to contact Host-Monitor for avro schema errors</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Agents-unable-to-contact-Host-Monitor-for-avro-schema-errors/m-p/46147#M40060</link>
      <description>&lt;P&gt;fantastic, thanks!&lt;/P&gt;</description>
      <pubDate>Tue, 11 Oct 2016 15:19:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Agents-unable-to-contact-Host-Monitor-for-avro-schema-errors/m-p/46147#M40060</guid>
      <dc:creator>ammolitor</dc:creator>
      <dc:date>2016-10-11T15:19:26Z</dc:date>
    </item>
    <item>
      <title>Re: Agents unable to contact Host-Monitor for avro schema errors</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Agents-unable-to-contact-Host-Monitor-for-avro-schema-errors/m-p/46149#M40061</link>
      <description>&lt;P&gt;ammolitor,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Cloudera has fixed this Cloudera Manager/Agent bug (Jira OPSAPS-35742) and the fix will be in the next possible releases of 5.5.x and up.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For now, the workaround is to remove the device that is large as the agent code will look at the device regardless of the exclusions. &amp;nbsp;You can still give it a try, though.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;parnigot, this seems to be a new manifestation of the same problem we saw with large file system size. &amp;nbsp;I'll open a new Jira for this as I don't think we have gotten a report of this at the interface level before. &amp;nbsp;Great find on the workaround, too. &amp;nbsp;Glad that works for the interface.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Ben&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 11 Oct 2016 15:41:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Agents-unable-to-contact-Host-Monitor-for-avro-schema-errors/m-p/46149#M40061</guid>
      <dc:creator>bgooley</dc:creator>
      <dc:date>2016-10-11T15:41:22Z</dc:date>
    </item>
    <item>
      <title>Re: Agents unable to contact Host-Monitor for avro schema errors</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Agents-unable-to-contact-Host-Monitor-for-avro-schema-errors/m-p/46150#M40062</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/16403"&gt;@ammolitor﻿&lt;/a&gt;, The difference between yours and the problem that&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/14480"&gt;@parnigot﻿&lt;/a&gt;&amp;nbsp;is seeing is that the large filesystem size is reported directly to Cloudera Manager via the agent's heartbeat. &amp;nbsp;That cannot be excluded via configuration, so unmounting the file system would be the answer there until the fix is available in an up-coming release.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/14480"&gt;@parnigot﻿&lt;/a&gt;, since your issue occurred (just noted the full stack you provided) when the agent is reporting metrics to the Host Monitor,&amp;nbsp;the metric collection for that interface&amp;nbsp;can be excluded via&amp;nbsp;&lt;EM&gt;&lt;EM&gt;Network Interface Collection Exclusion Regex&lt;/EM&gt; &lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Even though the NIC's metrics seem to be misreported, I have opened an internal Cloudera Jira,&amp;nbsp;OPSAPS-37261, so we can consider how to prevent this sort of thing from causing problems for the agent.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for the very detailed information!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Ben&lt;/P&gt;</description>
      <pubDate>Tue, 11 Oct 2016 15:59:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Agents-unable-to-contact-Host-Monitor-for-avro-schema-errors/m-p/46150#M40062</guid>
      <dc:creator>bgooley</dc:creator>
      <dc:date>2016-10-11T15:59:24Z</dc:date>
    </item>
    <item>
      <title>Re: Agents unable to contact Host-Monitor for avro schema errors</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Agents-unable-to-contact-Host-Monitor-for-avro-schema-errors/m-p/46153#M40063</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/4054"&gt;@bgooley﻿&lt;/a&gt;&amp;nbsp;is this coming soon?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This large filesystem is THE filesystem for my cluster, unmounting it is not an option in this case. &amp;nbsp;Is there any other workaround?&lt;/P&gt;</description>
      <pubDate>Tue, 11 Oct 2016 17:04:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Agents-unable-to-contact-Host-Monitor-for-avro-schema-errors/m-p/46153#M40063</guid>
      <dc:creator>ammolitor</dc:creator>
      <dc:date>2016-10-11T17:04:56Z</dc:date>
    </item>
    <item>
      <title>Re: Agents unable to contact Host-Monitor for avro schema errors</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Agents-unable-to-contact-Host-Monitor-for-avro-schema-errors/m-p/46154#M40064</link>
      <description>&lt;P&gt;Sorry, no other workaround I think think of other than altering the code in "filesystem_map.py" (which I would not recommend).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The only version of Cloudera Manager that has the fix at this time is 5.7.4. &amp;nbsp;If you are on a previous release, then you can upgrade CM and agents to get the fix.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Ben&lt;/P&gt;</description>
      <pubDate>Tue, 11 Oct 2016 17:25:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Agents-unable-to-contact-Host-Monitor-for-avro-schema-errors/m-p/46154#M40064</guid>
      <dc:creator>bgooley</dc:creator>
      <dc:date>2016-10-11T17:25:52Z</dc:date>
    </item>
    <item>
      <title>Re: Agents unable to contact Host-Monitor for avro schema errors</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Agents-unable-to-contact-Host-Monitor-for-avro-schema-errors/m-p/46156#M40065</link>
      <description>&lt;P&gt;editing config.ini seemed to get us where we need to be.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Specifically we&amp;nbsp;removed nfs and nfs4 from monitored_nodev_filesystem_types&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;sed -i 's/nfs,nfs4,//g' /etc/cloudera-scm/agent/config.ini&lt;/PRE&gt;</description>
      <pubDate>Tue, 11 Oct 2016 19:12:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Agents-unable-to-contact-Host-Monitor-for-avro-schema-errors/m-p/46156#M40065</guid>
      <dc:creator>ammolitor</dc:creator>
      <dc:date>2016-10-11T19:12:55Z</dc:date>
    </item>
    <item>
      <title>Re: Agents unable to contact Host-Monitor for avro schema errors</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Agents-unable-to-contact-Host-Monitor-for-avro-schema-errors/m-p/46157#M40066</link>
      <description>&lt;P&gt;Awesome! &amp;nbsp;I thought I had tested that, but apparently not. &amp;nbsp;If your agent is heartbeating now, sounds like a good workaround till you can upgrade.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I checked and CM 5.8.3 should also have a fix when it is released. &amp;nbsp;It has not gone to code freeze yet, so we are weeks out yet on that.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for sharing!&lt;/P&gt;</description>
      <pubDate>Tue, 11 Oct 2016 19:22:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Agents-unable-to-contact-Host-Monitor-for-avro-schema-errors/m-p/46157#M40066</guid>
      <dc:creator>bgooley</dc:creator>
      <dc:date>2016-10-11T19:22:26Z</dc:date>
    </item>
  </channel>
</rss>

