About bgooley

bgooley · ‎10-31-2016

Hello, What you did to work around this is fine. Basically, what you really want to do is wait till the decommission command is complete since Cloudera Manager will not let you delete the role until commands running against that role are complete. Since "decommission" returns a command object, I imagine you could use the id to query the commands list for that service. I imagine we have some example code for that lying around, but I'm not sure where it is. Waiting 60 seconds is likely OK, but verifying the command has completed is more sound.

bgooley · ‎10-21-2016

Hi Joe, Cloudera Manager does support the use case where you generate your own keytabs. Please see: http://www.cloudera.com/documentation/enterprise/latest/topics/sg_keytab_retrieval_script.html Essentially: - AD admins create all the keytabs and they are placed in a location on the Cloudera Manager host - When Cloudera Manager needs one of the keytabs, the custom retrieval script will locate the keytab, copy it to a temporary location, then import into Cloudera Manager for storage in the Cloudera Manager database. I hope that helps. -Ben

bgooley · ‎10-19-2016

I would recommend reviewing the Cloudera Manager log for clues, but, for now, access your Cloudera Manager database and run the following: delete from CONFIGS where ATTR='web_tls'; This will disable TLS for the CM UI Afterward, try starting again. If that doesn't help, let us know.

bgooley · ‎10-11-2016

Awesome! I thought I had tested that, but apparently not. If your agent is heartbeating now, sounds like a good workaround till you can upgrade. I checked and CM 5.8.3 should also have a fix when it is released. It has not gone to code freeze yet, so we are weeks out yet on that. Thanks for sharing!

bgooley · ‎10-11-2016

Sorry, no other workaround I think think of other than altering the code in "filesystem_map.py" (which I would not recommend). The only version of Cloudera Manager that has the fix at this time is 5.7.4. If you are on a previous release, then you can upgrade CM and agents to get the fix. Regards, Ben

bgooley · ‎10-11-2016

@ammolitor, The difference between yours and the problem that @parnigot is seeing is that the large filesystem size is reported directly to Cloudera Manager via the agent's heartbeat. That cannot be excluded via configuration, so unmounting the file system would be the answer there until the fix is available in an up-coming release. @parnigot, since your issue occurred (just noted the full stack you provided) when the agent is reporting metrics to the Host Monitor, the metric collection for that interface can be excluded via Network Interface Collection Exclusion Regex Even though the NIC's metrics seem to be misreported, I have opened an internal Cloudera Jira, OPSAPS-37261, so we can consider how to prevent this sort of thing from causing problems for the agent. Thanks for the very detailed information! Ben

bgooley · ‎10-11-2016

ammolitor, Cloudera has fixed this Cloudera Manager/Agent bug (Jira OPSAPS-35742) and the fix will be in the next possible releases of 5.5.x and up. For now, the workaround is to remove the device that is large as the agent code will look at the device regardless of the exclusions. You can still give it a try, though. parnigot, this seems to be a new manifestation of the same problem we saw with large file system size. I'll open a new Jira for this as I don't think we have gotten a report of this at the interface level before. Great find on the workaround, too. Glad that works for the interface. Regards, Ben

bgooley · ‎10-10-2016

Very interesting. I see that the output you have only has objectClass=top when the default is: Active Directory Account Properties accountExpires=0,objectClass=top,objectClass=person,objectClass=organizationalPerson,objectClass=user If you have a little blue arrow near the "Active Directory Account Properties" configuration in the Kerberos Settings, click that to return to the default. That said, you can regenerate credentials by shutting down your cluster and then checking all principals and clicking the Regenerate Selected button. If you don't have Active Directory Delete Accounts on Credential Regeneration checked in your Kerberos Settings, you'll need to manually delete the principal objects from AD first. I don't think this is an outright bug, but it would be nice to know what is going on with the objectclasses list.

bgooley · ‎10-10-2016

Hello Joe, I didn't have a chance to reply to your original post, but that AD error was a bit unclear anyway. What did you end up identifying as the cause and what solution, exactly, did you implement? It appears you removed the msds-supportedEncryptionTypes attribute. If so, what version of Windows/AD are you using? We have added that code in Cloudera Manager 5.8.0 so that, if desired, Cloudera Manager will create the principal objects in Active Directory along with support for AES128 and AES256 encryption types. You can opt out of this by making sure that Active Directory Set Encryption Types is set to the default of false in Administration --> Settings --> Kerberos This is off by default, so it must have been checked at some point. If false, the msds-supportedEncryptionTypes is not set. Another question is why your Active Directory schema does not support that attribute. Regards, Ben

bgooley · ‎10-06-2016

Hello everyone, A few things based on previous comments: (1) The = at the beginning indicates that the ntpd daemon on that host is polling the NTP server to get time. It does not mean it is synchronized, though. Any host in the list of NTP servers can be used to synchronized time. (2) If there is no * at the beginning of the output when the agent runs "ntpdc -np" then you will get an offset alert if you have enabled them (3) Functions of Hadoop require times be synchronized as much as possible, so the Clock Offset Health alert's intent is to inform cluster administrators that there could be a problem. If you verify that there is no problem... perhaps the NTP server is having some temporary trouble, you can adjust the thresholds or disable the alert altogether until the problem is resolved. (4) What Naveen1 refers to is when the agent host is overwhelmed and cannot service the agent's health check requests in due time so multiple health checks can become bad or unknown. I would not consider them 'false' in the sense that something simple like getting NTP info or resolving a hostname failed. In that case, the host is likely overloaded to the point where it is in crisis and basic commands cannot run. To me that warrants some investigation into host/service tuning, adding RAM, new hosts to the cluster, etc. as it is unlikely to improve and will cause more severe service problems in your cluster. If you only see the clock offset health alert failing by itself, it is either: - the offset exceeds the configured threshold in CM'sHost Clock Offset Thresholds - ntpdc -np output did not have a "*" at the beginning of a line For either, you can decide if you want to receive the alerts if you can confirm your system time is in sync across the cluster. To shut off the alerts set: "Host Clock Offset Thresholds" Cricital: Never BOTTOM LINE: The alert is there to warn you that it seems something very bad is happening so you can check it out. If you know things are OK or don't care, you can shut off the alert. If you want the alerts on, then ntpdc -np needs to return expected results. ntpdc queries ntpd, so ntpd needs to be healthy, too. NOTE: chrony is supported in Cloudera Manager 5.8 and up if you use chony and support should be seemless Cheers, Ben

Online	Offline
Last Visited	‎04-24-2020 01:13 PM

Member Since	‎04-22-2014 02:47 PM
Last Visited	‎04-24-2020 01:13 PM
Posts	1,218
Kudos received	339

Cloudera Community

Re: ALL hadoop-mapreduce-examples.jar fail cdh6

Re: YARN NodeManagers failed to start with permiss...

Re: Disable admin Login in Cloudera Manager

Re: Kerberos not authenticating from Hadoop Gatewa...

Re: Sqoop connection to Kerberos authenticated RDB...

Re: Is this a known bug in API V11?

Re: How to create AD principals manually

Re: how to rollback cloudera manager tls configura...

Re: Agents unable to contact Host-Monitor for avro...

Re: Agents unable to contact Host-Monitor for avro...

Re: Agents unable to contact Host-Monitor for avro...

Re: Agents unable to contact Host-Monitor for avro...

Re: Getting error when trying to Generate Missing ...

Re: Getting error when trying to Generate Missing ...

Re: Cloudera 5.4.x cluster randomly reports "Clock...