About bgooley

bgooley · ‎04-23-2019

@wusj, I was a bit unclear in my previous update regarding the packages to install. The CM 6.2 Agent package needs to be installed on each of your cluster hosts and the agent needs to be restarted with "service cloudera-scm-agent restart" or "systemctl restart cloudera-scm-agent" in order for the code fix to be used.

bgooley · ‎04-23-2019

@wusj, It does indeed look like the same issue, but you stated you installed CDH 6.2. I assume then you have also installed Cloudera Manager 6.2 packages on *both* agents and Cloudera Manager. The fix is in the agents code, so you need to make sure all agents are version 6.2 and that they have been restarted after upgrading the packages. I just tested by upgrading from Cloudera Manager 6.1 to 6.2 and the issue was resolved in my environment without any configuration changes. If you did restart CM and the agents after installing CM 6.2, then please share the following from one of the hosts having the problem: grep -v -e '^[[:space:]]*$' -e '^#' /etc/cloudera-scm-agent/config.ini This will show us your agent configuration. Feel free to not include your CM hostname as we are mainly concerned with your [security] section. Ben

bgooley · ‎04-07-2019

@mmmunafo, Restarting the status server will not help since the underlying issue will remain the same. Also, CDH-76040 is not the cause of this issue as it is specific to the Resource Manager. CDH-76040 is actually resolved in 6.1.1 and 6.2 (both available). The problem you are seeing is likely caused by a new feature added in Cloudera Manager 6.1 that attempts to secure port 9000 (the status server port that the Cloudera Manager agent uses to respond ot requests from Cloudera Manager). The agent will detect that it should run the host inspector command, but it when Cloudera Manager attempts to download results on port 9000, it thinks that TLS should be used when port 9000 is not listening via TLS. The issue impacts CM 6.1 but was fixed in Cloudera Manager 6.2. If you can upgrade to CM 6.2 (CDH does not need to be upgraded right away) then this issue should go away. For reference, the fix is associated with internal Cloudera Jira OPSAPS-48958. Note that the issue you are facing occurs when agent communication is encrypted, but the agents have no keys, certificates, or truststore configured. Apart from upgrading to 6.2, the other solution is to configure all agents with their own keys, certificates, and "verify_cert_file" configurations that would normally be there if CM Agent authentication was enabled.

bgooley · ‎03-22-2019

@IVenkatesh, Cloudera Manager knows to which database server to connect based on what it sees in /etc/cloudera-scm-server/db.properties We would not know which server should be used. It sounds like you'll have to figure out what has changed since you tried "moving" mysql server. If you used to have more users, try accessing each 'scm' database with a mysql client and see what rows exist in the USERS table. That might give you some clues.

bgooley · ‎03-15-2019

@Somanath, Based on my code review and testing, the original logging that was provided in this thread is caused by a minor bug in CM 5.12 and higher only when single-user mode is configured or the agent is not running as root. I opened a new internal Cloudera Jira for this issue: OPSAPS-49735. In my case, though, even though I reproduced the errors, this did not prevent the Zookeeper server from starting. I think it would be advised that you still review the logs to make certain of the cause of the server failing to start. On any host showing the "UnboundLocalError: local variable 'mdata' referenced before assignment" error: (1) Back up your os_ops.py file so you can role back if required Assuming you have Python 2.7 like was posted in the error in this thread, you can find the os_ops.py file here: /usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/util/os_ops.py prompt> cd /usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/util/ prompt> cp ./os_ops.py ./os_ops.py.original (2) Edit os_ops.py to move "mdata = self.get_path_metadata(path)" before the "if" conditional: prompt> vim os_ops.py Locate the following block of code in mkabsdir(): if os.path.isdir(path): # Log warnings if user/group/mode are different than what's expected if self.honor_users_and_groups: mdata = self.get_path_metadata(path) Move this line above "if self.honor_users_and_groups:": mdata = self.get_path_metadata(path) The result should look like this: if os.path.isdir(path): # Log warnings if user/group/mode are different than what's expected mdata = self.get_path_metadata(path) if self.honor_users_and_groups: if user is not None and user != mdata.user: LOG.warning('Expected user %s for %s but was %s', user, path, mdata.user) if group is not None and group != mdata.group: LOG.warning('Expected group %s for %s but was %s', group, path, mdata.group) if mode is not None and oct(mode) != mdata.mode: LOG.warning('Expected mode %s for %s but was %s', oct(mode), path, mdata.mode) return False Save your edits This change will make sure that "mdata" is assigned a value before it is referenced. (3) Restart the agent on the host where you updated os_ops.py: prompt> systemctl restart cloudera-scm-agent or on el6 oses: prompt> service cloudera-scm-agent restart (4) If the agent does not restart and it cites some python problem, you can revert by copying the "os_ops.py.original" file to overwrite the "os_ops.py" file you edited. Restart after that.

bgooley · ‎03-15-2019

@Somanath, We are sorry to hear that you are hitting a problem starting zookeeper. Before looking at doing a hard restart, let's verify some information about the issue. There are many possible causes of the zookeeper not starting, so we need to be sure we understand your particular issue in order to make sure we can suggest the best way forward. (1) First, please share the agent log (by default /var/log/cloudera-scm-agent/cloudera-scm-agent.log) information that shows the zookeeper process trying to start. (2) Next, if you see the line "Triggering supervisord update" following the lines referring to zookeeper, also review the stdout.log and stderr.log from your zookeeper process directory. You can access them here # ls /var/run/cloudera-scm-agent/process/`ls -lrt /var/run/cloudera-scm-agent/process/ | awk '{print $9}' |grep -i ZOOKEEPER| tail -1`/logs The above will list the logs directory contents for the most recent process. If it is empty, then that indicates that the supervisor was not signaled to start zookeeper. This would mean that the error happens during agent processing. (3) What version of Cloudera Manager and CDH are you using?

bgooley · ‎03-08-2019

Hello @henrikringcsc , Thank you for bringing up this topic and doing so much research! The bug you mention was fixed some time ago, so what you are seeing now is different than the older bugs. The error you see is: Caused by: KrbException: Illegal config content:includedir /etc/krb5.conf.d/ The error from the Java bug is: javax.security.auth.login.LoginException: KrbException: Config file must starts with a section Based on the exception, your version, and the current Kerberos Configuration file parsing "Config" class, we can guess that your krb5.conf had an "includedir" inside a section of krb5.conf which is, indeed, not legal for MIT or Java Kerberos implementations. To put the theory to the test, I tested with Oracle JDK 1.8u162 and OpenJDK 1.8u201. When I had the following in my krb5.conf file, my HTTPFS server started fine: # Other applications require this directory to perform krb5 configuration. includedir /etc/krb5.conf.d/ [libdefaults] If, however, I added the "includedir" line after the libdefaults section title, I reproduced your exception. I did this: # Other applications require this directory to perform krb5 configuration. includedir /etc/krb5.conf.d/ [libdefaults] includedir /etc/krb5.conf.d/ The exception when starting HTTPFS server (snipped): Caused by: KrbException: Illegal config content:includedir /etc/krb5.conf.d/ at sun.security.krb5.Config.parseStanzaTable(Config.java:634) at sun.security.krb5.Config.<init>(Config.java:197) at sun.security.krb5.Config.getInstance(Config.java:98) Looking at the latest code here, we can then see that parseStanzaTable comes after the krb5.conf file has been read into memory. In https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8029994 we see that the bug was in the loading of the configuration file, not in the parsing of the sections as we see in your exception. http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/4e2fc4ce3a1a/src/share/classes/sun/security/krb5/Config.java CONCLUSION: If you wish to add "includedir" lines, they are fine, but they need to come before the first section title in your krb5.conf file in order for it to be valid. If that doesn't work out, let us know and include your krb5.conf so we can understand how it is formatted. Ben

bgooley · ‎02-15-2019

@aalexand, CM/CDH 6.1 supports the use of OpenJDK 1.8, so you are good there... Backing up a bit, looking at your first stack trace, we find that the failure occurs *after* the TLS handshake. See here: Traceback (most recent call last): File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1397, in _send_heartbeat response = self.requestor.request('heartbeat', heartbeat_data) Line 1397 comes after a connection to the server has been established, so the original issue is not TLS related according to the call stack. Based on the last call, it appears the agent was waiting for the heartbeat response but 0 bytes were returned: File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 417, in read_framed_message raise ConnectionClosedException("Reader read 0 bytes.") ConnectionClosedException: Reader read 0 bytes. Based on this, it appears that the agent received a non-avro response from the server. Among some other things, this could be caused by: (1) The server not being Cloudera Manager. Check to make sure the sever listening on port 7182 is actually CM. You can use: netstat -nap |grep 7182 on the CM host (2) Cloudera Manager failed the processing of the heartbeat. Check the CM logs to see if there are any messages at the time that the agent is showing the exception. /var/log/cloudera-scm-server/cloudera-scm-server.log Hopefully one of those gives some more clues.

bgooley · ‎02-11-2019

@ChineduLB, Thanks for giving Cloudera and Hadoop a try. The quickstart VMs we have seem to only go up to 5.13 and we are currently at 6.1. If you are looking for a simpler way to get a test cluster on a single node set up, make sure you have a host with at least 10GB and more realistically 16GB so you can run all servers on the same host and then... This will help you: https://www.cloudera.com/documentation/enterprise/latest/topics/poc_installation.html If you are stuck on something in particular, though, please let us know. This community is happy to help.

bgooley · ‎02-08-2019

I'm glad that was it because I couldn't figure out many other possible causes of that sort of behavior :-).

Online	Offline
Last Visited	‎04-24-2020 01:13 PM

Member Since	‎04-22-2014 02:47 PM
Last Visited	‎04-24-2020 01:13 PM
Posts	1,218
Kudos received	339

Cloudera Community

Re: ALL hadoop-mapreduce-examples.jar fail cdh6

Re: YARN NodeManagers failed to start with permiss...

Re: Disable admin Login in Cloudera Manager

Re: Kerberos not authenticating from Hadoop Gatewa...

Re: Sqoop connection to Kerberos authenticated RDB...

Re: Failed running performance inspector on host x...

Re: Failed running performance inspector on host x...

Re: Host Inspector TLS issue

Re: AuthenticationFailureEventListener: Authentica...

Re: Failed to execute command Start on service Zoo...

Re: Failed to execute command Start on service Zoo...

Re: Remark regarding Java 8 Kerberos configuration...

Re: scm-agent connected but not recognized by scm-...

Re: Cloudera manager install on single rhel7 linux...

Re: Hue: disabling LDAP does not enable local "Add...