About bgooley

bgooley · ‎07-05-2018

@galzoran, That original post was from years ago, so let's get your information so we can make sure we are troubleshooting the same thing. The stack traces from your issue will be more useful than the old ones. The agent will periodically make an HTTP request of roles running on the same host as the agent to load JMX output and supply that to Service Monitor for metrics collection. If that JMX loading fails, you can see events listed in Cloudera Manager indicating as much. The best thing you can do to start off is to get the stack traces that occur when the agent fails to access the JMX information in the web resource. This information will be in the agent logs on that host (/var/log/cloudera-scm-agent/cloudera-scm-agent.log by default). If you can show us a few of those, it will give us a good idea of what we can look at next.

bgooley · ‎07-05-2018

@balusu, Can you clarify what you are trying to test with lower case realms? The realm in the kerberos principal should be uppercase, so the lower case is not expected. If you "kinit" make certain you specify the realm in uppercase. The auth_to_local rules are not intended to match a lowercase realm, so the response you get is expected. -Ben

bgooley · ‎07-02-2018

@dinoma, Authentication errors don't often tell us more than that the authentication failed. Assuming the password is correctly encoded for a URL, some other possibilities to check: - The password supplied is not correct in some way. You could verify the smtp password by using telnet and manually typing in the non-encoded password to make sure it is correct - The encoding didn't work as expected so the resulting encoded password was not correct due to shell interpretation or something. You could try this to see if it changes the resulting encoded password: Create a new file containing the plain text, non-encoded password # vi ~/password_file Strip off the trailing return character: # echo -n ~/password_file > ~/password_file2 Encode the password: # python -c "import urllib,sys; print urllib.quote_plus(sys.argv[1])" `cat ~/password_file2`

bgooley · ‎06-29-2018

@dinoma, What you tried sounds like the right thing, but perhaps there are other characters in the string that don't stop the server from starting, but prevent the password from being passed to the SMTP server properly. You can use this to print out an escaped version of the password to enter into the "Mail Server Password" field: # python -c "import urllib,sys; print urllib.quote_plus(sys.argv[1])" "p%ss+word!" Replace the training string in quotes with your literal password (no escaping). For example, the above command would result in: p%25ss%2Bword%21 Cloudera has a Jira open for this issue, but we don't have a fix for it yet.

bgooley · ‎06-29-2018

@ebeb, Actually, I just realized something that is very important. The supervisor stack trace you got is normal if no supervisor is running and no supervisord.conf file exists. I just tested and I see exactly the same stack trace if I delete my supervisord.conf. The next thing that happens after this exception is that the agent attempts to start a supervisord process. The first step is to run "mount_tmpfs". I fee like there must be something going wrong with that codepath because we don't see any other lines after that. I went back and looked at your agent errors and one seems very relevant: OSError: [Errno 2] No such file or directory: '/etc/mtab' It seems your /etc/mtab has gone missing. I just tested by removing the symbolic link and got exactly the same problem you are seeing. RESOLUTION: recreate /etc/mtab use: # ln -s /proc/self/mounts /etc/mtab NOTE: you might check first to see if the contents of /proc/self/mounts looks right. Hope this does the trick!

bgooley · ‎06-28-2018

@ebeb, The Cloudera Manager agents use their own supervisor so installing and running the supervisord as a separate service will not help. At this stage, it may actually be reasonable to kill the supervisor as there is something quite wrong where the supervisord.conf does not exist. NOTE: The following will kill all child processes of the supervisor (including any hadoop processes that are running). It will also clean out the /var/run/cloudera-scm-agent directory and recreate files from scratch. (1) Try stopping the agent in a way that will kill the supervisor and any running agent processes: # service cloudera-scm-agent hard_stop_confirmed (2) run: # ps aux |grep supervisord If you see a supervisord process, kill it Make sure no supervisord processes are running (3) Run: # service cloudera-scm-agent clean_start After this, check to see if the agent is heartbeating. These steps I don't recommend often as usually there are better ways to isolate the root cause, but something very bad seems to have happened to the supervisor and/or supervisor's configuration file.

bgooley · ‎06-28-2018

@ebeb, We can tell from the stack trace that the failure occurred when the agent acted as a client of the supervisor, attempts to read the supervisord.conf failed. Further information is likely in /var/log/cloudera-scm-agent/supervisord.log or supervisord.out I suggest checking them for clues about the cause. Also, try connecting with a command line utility to see if that gives any more error information: # /usr/lib64/cmf/agent/build/env/bin/supervisorctl -c /var/run/cloudera-scm-agent/supervisor/supervisord.conf

bgooley · ‎06-28-2018

@balusu, As mentioned, you would want to add the realm to the HDFS configuration "Trusted Kerberos Realms". This will allow Cloudera Manager to generate the necessary auth_to_local rule for that realm. The regex you used is, indeed, not correct as you have two "\E" but no "\Q" to match. I am not sure, exactly, what trouble you had with the case of realms, but the realm should always be in uppercase format. For more information on regex, etc., this is a great resource: https://www.cloudera.com/documentation/enterprise/5-14-x/topics/cdh_sg_kerbprin_to_sn.html#topic_19_1

bgooley · ‎06-27-2018

That's great news. To avoid any confusion, the automatically generated auth_to_local rules (based on a realm listed in "Trusted Kerbreros Realms" would look like this: RULE:[1:$1@$0](.*@\QEXAMPLE.COM\E$)s/@\QEXAMPLE.COM\E$// RULE:[2:$1@$0](.*@\QEXAMPLE.COM\E$)s/@\QEXAMPLE.COM\E$// It appears that perhaps some of your characters were interpretted as special when you printed the generaged rules.

bgooley · ‎06-27-2018

Hi @balusu, Actually, the error in your log snippet is: 18/06/28 02:20:56 INFO util.KerberosName: No auth_to_local rules applied to exampleuser@example.com. This error occurs when no rules in your "hadoop.security.auth_to_local" property in the server's core-site.xml matched the principal, "exampleuser@example.com" This is not a kerberos error; rather, this is a message being returned by hadoop code when hadoop tries to map your principal to a unix user name. Generally, if you are attempting to act on a hadoop service with a user who is not in the hadoop cluster's Kerberos realm, you need to make sure that the hadoop.security.auth_to_local property includes rules that will match the principal and convert the string to just a username. Cloudera Manager will create such rules for you if you add the other realm to the "Trusted Realms" or "Trusted Kerberos Realms" configuration. see: https://www.cloudera.com/documentation/enterprise/5-14-x/topics/cm_sg_kerbprin_to_sn.html Note that you will need to deploy client configuration and restart the cluster after making this change. -Ben

Online	Offline
Last Visited	‎04-24-2020 01:13 PM

Member Since	‎04-22-2014 02:47 PM
Last Visited	‎04-24-2020 01:13 PM
Posts	1,218
Kudos received	339

Cloudera Community

Re: ALL hadoop-mapreduce-examples.jar fail cdh6

Re: YARN NodeManagers failed to start with permiss...

Re: Disable admin Login in Cloudera Manager

Re: Kerberos not authenticating from Hadoop Gatewa...

Re: Sqoop connection to Kerberos authenticated RDB...

Re: The Cloudera Manager Agent is not able to comm...

Re: Kerberos ticket error:No rules applied to hdfs...

Re: hit authentication problem when alert mail pas...

Re: hit authentication problem when alert mail pas...

Re: Failed to connect to previous supervisor

Re: Failed to connect to previous supervisor

Re: Failed to connect to previous supervisor

Re: Kerberos ticket error:No rules applied to hdfs...

Re: Kerberos ticket error:No rules applied to hdfs...

Re: Kerberos ticket error:No rules applied to hdfs...