About bgooley

bgooley · ‎05-24-2019

@sree3192 , Welcome to the Community. I started a new thread since your output indicates a different issue than that older thread to which you originally replied. Key information: The problem occurs when importing credentials (import_credentials.sh) The error is "kinit: Client 'USERNAME-REDACTED' not found in Kerberos database while getting initial credentials" The error is coming from MIT Kerberos libraries and it means that the user (which is redacted in the output) cannot be found in the configured KDC. Please make sure you have created the user principal you specified for Cloudera Manager to use in order to import the admin user's keytab. For instance, if typed in my_cm_user/admin make sure that your KDC has a principal for that user

bgooley · ‎05-24-2019

@Ryanp , You can find all file locations in the following documentation: https://www.cloudera.com/documentation/enterprise/6/6.2/topics/auto_tls.html#auto_tls_agent_files The agent on each host will be configured to use the password stored in /var/lib/cloudera-scm-agent/agent-cert/cm-auto-host_key.pw Ben

bgooley · ‎05-22-2019

@BiggieSmalls , It would probably be a good idea to start a new thread for this new issue if more discussion is required. To your question, though, the Cloudera Manager agent will run the following command to determine if chronyd is running: # pidof chronyd If the result code is "0" that means there was a chronyd process running, so the agent will attempt to check the offset. This is done by running: # chronyc -n sources The agent then iterates over the output, line by line. If a line starts with "*" then it will attempt to derive the offset by looking at that line. The error you get "not synchronized to any server" is returned of NO lines started with "*" If you did not intend to use chronyd, then you can shut it off on the problematic host and then make sure ntpd is working by using "ntpq -np" and making sure one of the lines starts with a "*" to indicate the daemon is synchronized to that server. Ben

bgooley · ‎05-22-2019

@rdbb , It is possible to even have a one-node cluster (With CM and CDH), but that is certainly not something that you would want to do if you want to leverage the inherent redundancy and load distrobution of CDH. Functionally, you can put roles most anywhere you please; as far as what is best for your desired use cases, though, we can't really comment without more context. If you only have 4 hosts, you still need to consider what sort of memory the roles that are on those hosts will demand (as well as CPU and disk). Also, do you want to use High Availability for HDFS and YARN (NameNode and Resource Manager)? Given enough RAM, disk, etc. you could put master host and edge node on one node and then use the other 3 as worker nodes. It is more about what you want to get out of this cluster, how important uptime is, and what resources you have. If you don't care too much and just want to play around, the configuration I described with master/edge on one host and workers on the other 3 is fine. In fact, that's basically what we do for our nightly builds here at Cloudera. I hope that halps a bit. Feel free to ask more questions.

bgooley · ‎05-22-2019

Hi @jess ; welcome to the Cloudera Community. In order to be sure we understand what you are seeing, please share a screen shot or two that shows us what you are seeing so that we can have a better understanding of the problem you are seeing. Make sure you click on the HDFS service and then look at the Instances tab to see what HDFS roles are in bad health. Also look at the "Health Tests" section to see if anything is reported there. Click on any roles that are in bad health to see more information about what health tests are failing. Also, good job looking at the Service Monitor log for clues. Can you show us the stack trace or log messages that say "connection refused?" The Service Monitor makes connections to several servers, so it is important to know to which it was connecting when the connection refused error occurred. Thanks!

bgooley · ‎04-30-2019

@srigowri, Before looking at firewalls, let's find out more about what the issue is that you are seeing. Plesae describe how you are seeing an issue and include screenshots if possible. I would also try running the following on the host where Cloudera Manager is installed: - # curl -u admin:lizard http://host-10-17-100-224.coe.cloudera.com:7180 -v - # netsat -nap |grep 718 If you can share the information with us that will help us suggest further actions.

bgooley · ‎04-30-2019

Hi @banshidhar_saho , Since the stack trace shows RunJar.java being used, that indicates the Java option you need is: java.io.tmpdir If you can se that in your "Java Configuration" safety valves for Yarn that should help. Since we don't see the whole stack trace in your post, we can't tell which safety valve would apply to that situation exactly.

bgooley · ‎04-28-2019

@datasir, We need to know what you have configured in Cloudera Manager with regard to agent communication (primarily agent encryption and authorization). Also, check your Cloudera Manager log for messages at the same time as your agent error messages. It would be a good idea to share the errors you are seeing so we can be sure we know what issue you are seeing. Also, post your configuration for the agent having the problem. This can be obtained with: grep -v -e '^[[:space:]]*$' -e '^#' /etc/cloudera-scm-agent/config.ini

bgooley · ‎04-26-2019

@ditu, This thread is super super old, so it would be best to confirm you are seeing the same issue. What message do you see regarding the canary test failure? Basically, the Service Monitor will perform a health check of HDFS by writing out a file to make sure that completes. If it doesn't complete, then that could mean some problems with HDFS that requires review so this triggers a bad health state. The canary test does the following: creates a file writes to it reads it back verifies the data deletes the file By default, the file name is: /tmp/.cloudera_health_monitoring_canary_files It is possible that the Service Monitor log (in /var/log/cloudera-scm-firehose) has some error or exception reflecting the failure. Note that the operation of writing to a file in HDFS requires communication with the NameNode and then the DataNode that the NameNode tells the client to write the file to. Failures could occur in various places.

bgooley · ‎04-23-2019

@Bildervic, "SIGBUS (0x7)" can mean a few things, but one of the most common ones is that a directory that Java needs to use is full (no more free disk space). The fact that your Node Manager was running and then failed and then failed to start supports that type of possible cause. Since the crash is in libleveldbjni, that gives us more evidence that a directory may be full since that indicates Java was accessing local files (on disk). I would suggest checking disk space on all volumes on that host. If there a volume that is full, then try freeing up some space and start the Nodemanager again.

Online	Offline
Last Visited	‎04-24-2020 01:13 PM

Member Since	‎04-22-2014 02:47 PM
Last Visited	‎04-24-2020 01:13 PM
Posts	1,218
Kudos received	339

Cloudera Community

Re: ALL hadoop-mapreduce-examples.jar fail cdh6

Re: YARN NodeManagers failed to start with permiss...

Re: Disable admin Login in Cloudera Manager

Re: Kerberos not authenticating from Hadoop Gatewa...

Re: Sqoop connection to Kerberos authenticated RDB...

Re: Enable Kerberos via Cloudera Manager wizard fa...

Re: truststore password location

Re: Should Cloudera NTP use Chrony or NTPD?

Re: Is it possible to run manager, utility and edg...

Re: HDFS goes in bad health

Re: Cloudera manager url is not opening

Re: Mapreduce job failure due to hadoop-unjarxxxxx...

Re: Cloudera 6 Heartbeat - SSLError: sslv3 alert b...

Re: HDFS goes in bad health

Re: Failed to start role -YARN- NodeManager (n...