Member since
02-03-2016
9
Posts
1
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1407 | 02-07-2018 03:18 AM | |
8685 | 10-11-2016 12:00 AM |
02-07-2018
03:18 AM
As far as I know this is expected: each cloudera manager supports its own api version and all the previous ones. It doesn't work for future versions. CM 5.12 -> supports v17 and lower CM 5.13 -> supports v18 and lower CM 5.14 -> supports v19 and lower
... View more
01-12-2018
02:56 AM
Hi, I was digging in the HBase metrics available in Cloudera Manager and I can't really understand the differences between the following metrics: total_read_requests_rate_across_regionservers, total_write_requests_rate_across_regionservers (from the service home page) lists: 18000 req/s for reads - 100 req/s for writes total_requests_rate_across_regionservers (from the charts library): lists something like 1000 req/sec Given the names I was expecting something like: total_requests = total_read_requests + total_write_requests but this is clearly not the case. Which metric reflects the actual load of the HBase cluster? Thanks, parnigot.
... View more
Labels:
- Labels:
-
Apache HBase
-
Cloudera Manager
10-11-2016
12:00 AM
1 Kudo
In the end we managed to solve this excluding the problematic network interface from the agent monitoring. Cloudera Manager indeed has an option to do than in the hosts configuration section. For the nic it's called Network Interface Collection Exclusion Regex (by default only the loopback interface is excluded). @ammolitor For the disks there are two options: Disk Device Collection Exclusion Regex and Filesystem Collection Exclusion Regex. Maybe one of these does the trick for you...
... View more
09-08-2016
12:36 AM
Hi, I'm having an issue with multiple cloudera manager agents on a CDH 5.2 cluster. The error we are seing on the CM web-interface is a generic one: This host is in contact with Cloudera Manager. The host's Cloudera Manager Agent's software version can not be determined The issue is not permanent and randomly comes and goes. In the log files (/var/log/cloudera-scm-agent/cloudera-scm.agent.log) the daemon prints a lot of these messages: [04/Sep/2016 17:40:54 +0000] 6757 MonitorDaemon-Reporter throttling_logger ERROR (9 skipped) Error sending messages to firehose: mgmt-HOSTMONITOR-398eb4f15a6b55c56ba3c74ad84d8633
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/src/cmf/monitor/firehose.py", line 75, in _send
self._requestor.request('sendAgentMessages', dict(messages=messages))
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 135, in request
self.write_call_request(message_name, request_datum, buffer_encoder)
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 173, in write_call_request
self.write_request(message.request, request_datum, encoder)
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 177, in write_request
datum_writer.write(request_datum, encoder)
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/io.py", line 768, in write
raise AvroTypeException(self.writers_schema, datum)
... formatted python dictionary ...
is not an example of the schema [... whole avro schema...] From my understanding the agent fails to serialize the data collected from the host to avro and can't update the host monitor. On these machines I also have a know-issue on the reported speed on the NICs: according to the OS i have 10Gbits interfaces that are doing hundreds of GiB/s (an obvious bug in the OS itself or on the NIC's fw/driver). Using the data from the "Last HMON status" from the agent's web-ui I've discoverd this "strange" coincidence: when an agent is experiencing the issue there's at least one NIC with a metric value > than the max long value: {'iface': 'bond0',
'metrics': [ ....
{'id': 11130,
'value': 9435474096333102400L},
... ],
} I don't know exaclty what these metric measure, maybe the bytes sent/received in the last minute? Netherless these numbers, compared with other host w/o the nic issue, are exceptionally high. Now, in python this isn't a problem because you basically can't overflow an int/long, but maybe the error above happens when the agent can't convert this very big number to a 64bit long in avro (9435474096333102400 is bigger than 9223372036854775807 = 2^63-1). I'm not sure about this because I can't really understand the avro schema and I don't know if the expected type for value is a long. What do you guys think? Has someone experienced anything like this? And bonus question: Is it possible to blacklist the bugged network interfaces from the agent statistics? Thanks, p
... View more
Labels:
07-27-2016
07:46 AM
Hello, We are currently experimenting with ACLs on YARN pools. Our goal is to have: a pool for each application where only the authorized user can submit jobs a group of users for each pool that can view application history and logs I'm using the following fair-scheduler.xml file (generated with Cloudera Manager): <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<allocations>
<queue name="root">
<weight>1.0</weight>
<schedulingPolicy>drf</schedulingPolicy>
<aclSubmitApps></aclSubmitApps>
<aclAdministerApps></aclAdministerApps>
<queue name="appA">
<weight>1.0</weight>
<schedulingPolicy>drf</schedulingPolicy>
<aclSubmitApps>appA developersA</aclSubmitApps>
<aclAdministerApps>appA developersA</aclAdministerApps>
</queue>
<queue name="appB">
<weight>1.0</weight>
<schedulingPolicy>drf</schedulingPolicy>
<aclSubmitApps>appB developersB</aclSubmitApps>
<aclAdministerApps>appB developersB</aclAdministerApps>
</queue>
</queue>
</allocations> For the point 1. (pool access only by app user) everything works fine, but I can't get to find a working configuration for point 2: for example if user devA (in group developersA) tries to view the logs for an application launched in appA get always the following error (in JH web console): User [devA] is not authorized to view the logs for container_1469609032080_0001_01_000001 in log file Any suggestion? Is this the intended behaviour or am I missing something? Our cluster specs/settings: yarn.acl.enable = true yarn.admin.acl = "yarn clusterAdminGroup" CDH 5.7 Kerberos authentication YARN web interface also using Kerberos authentication Thank you, Bye
... View more
Labels: