About parnigot

parnigot · ‎02-07-2018

As far as I know this is expected: each cloudera manager supports its own api version and all the previous ones. It doesn't work for future versions. CM 5.12 -> supports v17 and lower CM 5.13 -> supports v18 and lower CM 5.14 -> supports v19 and lower

parnigot · ‎01-12-2018

Hi, I was digging in the HBase metrics available in Cloudera Manager and I can't really understand the differences between the following metrics: total_read_requests_rate_across_regionservers, total_write_requests_rate_across_regionservers (from the service home page) lists: 18000 req/s for reads - 100 req/s for writes total_requests_rate_across_regionservers (from the charts library): lists something like 1000 req/sec Given the names I was expecting something like: total_requests = total_read_requests + total_write_requests but this is clearly not the case. Which metric reflects the actual load of the HBase cluster? Thanks, parnigot.

parnigot · ‎10-11-2016

In the end we managed to solve this excluding the problematic network interface from the agent monitoring. Cloudera Manager indeed has an option to do than in the hosts configuration section. For the nic it's called Network Interface Collection Exclusion Regex (by default only the loopback interface is excluded). @ammolitor For the disks there are two options: Disk Device Collection Exclusion Regex and Filesystem Collection Exclusion Regex. Maybe one of these does the trick for you...

parnigot · ‎09-08-2016

Hi, I'm having an issue with multiple cloudera manager agents on a CDH 5.2 cluster. The error we are seing on the CM web-interface is a generic one: This host is in contact with Cloudera Manager. The host's Cloudera Manager Agent's software version can not be determined The issue is not permanent and randomly comes and goes. In the log files (/var/log/cloudera-scm-agent/cloudera-scm.agent.log) the daemon prints a lot of these messages: [04/Sep/2016 17:40:54 +0000] 6757 MonitorDaemon-Reporter throttling_logger ERROR (9 skipped) Error sending messages to firehose: mgmt-HOSTMONITOR-398eb4f15a6b55c56ba3c74ad84d8633 Traceback (most recent call last): File "/usr/lib64/cmf/agent/src/cmf/monitor/firehose.py", line 75, in _send self._requestor.request('sendAgentMessages', dict(messages=messages)) File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 135, in request self.write_call_request(message_name, request_datum, buffer_encoder) File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 173, in write_call_request self.write_request(message.request, request_datum, encoder) File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 177, in write_request datum_writer.write(request_datum, encoder) File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/io.py", line 768, in write raise AvroTypeException(self.writers_schema, datum) ... formatted python dictionary ... is not an example of the schema [... whole avro schema...] From my understanding the agent fails to serialize the data collected from the host to avro and can't update the host monitor. On these machines I also have a know-issue on the reported speed on the NICs: according to the OS i have 10Gbits interfaces that are doing hundreds of GiB/s (an obvious bug in the OS itself or on the NIC's fw/driver). Using the data from the "Last HMON status" from the agent's web-ui I've discoverd this "strange" coincidence: when an agent is experiencing the issue there's at least one NIC with a metric value > than the max long value: {'iface': 'bond0', 'metrics': [ .... {'id': 11130, 'value': 9435474096333102400L}, ... ], } I don't know exaclty what these metric measure, maybe the bytes sent/received in the last minute? Netherless these numbers, compared with other host w/o the nic issue, are exceptionally high. Now, in python this isn't a problem because you basically can't overflow an int/long, but maybe the error above happens when the agent can't convert this very big number to a 64bit long in avro (9435474096333102400 is bigger than 9223372036854775807 = 2^63-1). I'm not sure about this because I can't really understand the avro schema and I don't know if the expected type for value is a long. What do you guys think? Has someone experienced anything like this? And bonus question: Is it possible to blacklist the bugged network interfaces from the agent statistics? Thanks, p

parnigot · ‎07-27-2016

Hello, We are currently experimenting with ACLs on YARN pools. Our goal is to have: a pool for each application where only the authorized user can submit jobs a group of users for each pool that can view application history and logs I'm using the following fair-scheduler.xml file (generated with Cloudera Manager): <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <allocations> <queue name="root"> <weight>1.0</weight> <schedulingPolicy>drf</schedulingPolicy> <aclSubmitApps></aclSubmitApps> <aclAdministerApps></aclAdministerApps> <queue name="appA"> <weight>1.0</weight> <schedulingPolicy>drf</schedulingPolicy> <aclSubmitApps>appA developersA</aclSubmitApps> <aclAdministerApps>appA developersA</aclAdministerApps> </queue> <queue name="appB"> <weight>1.0</weight> <schedulingPolicy>drf</schedulingPolicy> <aclSubmitApps>appB developersB</aclSubmitApps> <aclAdministerApps>appB developersB</aclAdministerApps> </queue> </queue> </allocations> For the point 1. (pool access only by app user) everything works fine, but I can't get to find a working configuration for point 2: for example if user devA (in group developersA) tries to view the logs for an application launched in appA get always the following error (in JH web console): User [devA] is not authorized to view the logs for container_1469609032080_0001_01_000001 in log file Any suggestion? Is this the intended behaviour or am I missing something? Our cluster specs/settings: yarn.acl.enable = true yarn.admin.acl = "yarn clusterAdminGroup" CDH 5.7 Kerberos authentication YARN web interface also using Kerberos authentication Thank you, Bye

Online	Offline
Last Visited	‎07-18-2018 05:03 AM

Member Since	‎02-03-2016 03:23 AM
Last Visited	‎07-18-2018 05:03 AM
Posts	9
Kudos received	1

Cloudera Community

Re: CM 5.12 doesnot work with latest cmapi version...

Re: Agents unable to contact Host-Monitor for avro...

Re: CM 5.12 doesnot work with latest cmapi version...

Confusing metrics for HBase in Cloudera Manager

Re: Agents unable to contact Host-Monitor for avro...

Agents unable to contact Host-Monitor for avro sch...

YARN with ACL - unable to view logs from RM webcon...