Member since
04-22-2014
1218
Posts
341
Kudos Received
157
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
21949 | 03-03-2020 08:12 AM | |
12569 | 02-28-2020 10:43 AM | |
3618 | 12-16-2019 12:59 PM | |
3181 | 11-12-2019 03:28 PM | |
4971 | 11-01-2019 09:01 AM |
03-31-2020
10:28 AM
1 Kudo
@Haris, Glad to hear you found the cause and solution. It took a lot of sweat and tears to get to that short list of possible causes for the condition, so I'm really glad it was one of them :-). Cheers!
... View more
03-24-2020
01:03 PM
Hi @WilsonLozano , Based on the fact that the ldapsearch command returned the object without issue, we can conclude that the bind user and password are correct. Thus, I believe we can assume that the issue may involve referrals and how they are being followed. I find this odd since I believe that ldapgroupsmapping should have referral following off by default. Nonetheless, we see in your ldapsearch result: ref: ldap://DomainDnsZones.sub.us.domain.local/DC=DomainDnsZones,DC=sub,DC=u s,DC=domain,DC=local So, what I would suggest trying is either: Change your search base to something more specific like "OU=Accounts,DC=sub,DC=us, DC=domain,DC=local" so that no referral is returned from Active Directory Try using the Global Catalog (port 3268 (non-TLS)) I am pretty confident that referrals are involved, but I don't know why hadoop commons would be following them. Another thing you could do is use "tcpdump" to capture packets on port 389 and then use WireShark to decode them. That would show us exactly what the client is trying to do and the response (in terms of LDAP protocol).
... View more
03-23-2020
02:10 PM
1 Kudo
Hi @Haris , Thanks! that error shows the server said that it could find no matching ciphers to allow the TLS handshake to occur. In the TLS 1.x handshake, the client will send a ClientHello message to the server. The server will then find the strongest cipher it has that shows up in that client list. If it cannot find any, it will return the error you mention. There are a few reasons this error might happen: There is no private key in the NodeManager Keystore There is a private key and a trusted certertificate that have the same public certificate (this should only impact CDH 6 and higher) The keystore is in PKCS12 format (even though the file is named with a JKS format) The client and server ciphers really don't overlap (super unlikely unless you have been changing cipher support) For starters, it would be good to have a look at your server's keystore (that it uses to start and listen via TLS). To do so you could use Java's keytool: keytool -list -keystore /path/to/servers/jks/file If you don't see any PrivateKeyEntry in the output, then it would seem there is no private key and the server cannot use TLS. If there is a PrivateKeyEntry, it could be the file is not in JKS format. The best way to verify that the keystore you are using is in JKS format is to use the linux "xxd" command like this: xxd -l 10 /path/to/servers/jks/file If the above command output starts out like this, then it is a JKS file. If not, it is a different format: 0000000: feed feed
... View more
03-23-2020
01:36 PM
@WilsonLozano, I believe the error you are getting indicates that the bind user defined in hadoop.security.group.mapping.ldap.bind.user does not existing in the LDAP server, but I didn't search online for confirmation. You could try using ldapsearch to test something like this: ldapsearch -x -H ldap://sub.us.domain.local:389 -D "ClouderaManager@SUB.US.DOMAIN.LOCAL" -W -b "DC=sub,DC=us,DC=domain,DC=local" "(&(objectClass=user)(sAMAccountName=c12345a))" If the above returns an error, you can try using debugging in ldapsearch to get a clearer picture what failed by using the "-d1" option in the command above (after -W for instance).
... View more
03-20-2020
03:11 PM
Hi @Haris ,
The error and stack trace shows us that the agent attempted to connect to:
https://NodeManager.example.com:8042/jmx
However, the connection failed and the agent threw an exception:
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 295, in connect_ssl return m2.ssl_connect(self.ssl, self._timeout) SSLError: sslv3 alert handshake failure
In the above, we see that a call was made to ssl_connect but a failure alert was returned. This indicates that the server (the NodeManager) failed the TLS handshake for some reason. If this was an issue with something on the client side (agent) then we would expect a more descriptive error on the client side about why the failure occurred.
If this is an error on the server side, it is possible the NodeManager log may have some information, but I doubt it.
I would recommend testing with curl on https://NodeManager.example.com:8042/jmx:
curl -v -k https://NodeManager.example.com:8042/jmx
The above should return JMX. If it doesn't, share the error with us.
Curl uses openssl libraries like the Agent does, so if curl works, so should the agent.
Note: does the problem happen all the time or sometimes?
... View more
03-04-2020
09:07 AM
Hello @Axis_BDL , There can be several causes for no databases being found in the Hue Impala app, so it is a good idea to look for more information in the Hue log. Usually the log file is /var/log/hue/runcpserver.log Perhaps try tailing that log while you reload the Impala page in Hue. We'll need some more information to help us learn what the problem is, so the Hue log is a good start.
... View more
03-03-2020
08:12 AM
1 Kudo
That's great! You should be able to replace the "NOT FOUND" values for those two fields with: -Djava.net.preferIPv4Stack=true This will configure it as CM usually has by default. Not sure how the NOT FOUND ended up there.
... View more
03-02-2020
11:33 AM
@HadoopBD, I was able to reproduce your symptoms based on what I saw in the debug on my successful run. Although I am sure there are a few ways this could happen, here is how I was able to get the same failure: [2020-03-02 18:56:45.154]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : Error: Could not find or load main class NOT HOW I REPRODUCED THE ERROR: (1) In Cloudera Manager, open YARN configuration (2) Search for Map Task Java Opts Base (3) Update Map Task Java Opts Base by adding a space and then the word "NOT" For example: -Djava.net.preferIPv4Stack=true NOT (4) Save change. Deploy Client Config Restarted YARN (probably not necessary) (5) Ran hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 4 4 (6) Result was the error. When I captured the application logs with the debugging I mentioned enabled, I could see that launch_container.sh issued the following Java command: exec /bin/bash -c "$JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Djava.net.preferIPv4Stack=true NOT -Xmx820m ... Since the word "NOT" does not have an option in front of it, Java interprets this as the class that should be run. Based on the above example, I would say that increasing container log deletion delay and enabling debug as I described in previous posts will show us the problem. Cheers, Ben
... View more
03-02-2020
10:23 AM
Hello @HadoopBD , Thanks for providing the logs, but they do not contain what we would expect if you had followed the steps to enabled container launch debug information. I am guessing you missed my steps during the threaded conversation. Basically, the standard logs show you some information, but not all. We are missing the actual files and log information about how the "launch_container" processes was started and what was passed to the script use to execute the necessary java. In order to capture that information, which will most likely give us some sort of clue about the cause of this issue. The steps to retain container launching information and also allow "yarn logs" command to obtain them is in CM 6.3 so I wanted to find out if you had that version. Here are the steps: If you are on Cloudera Manager 6.3 or higher, you can try the following to collect more information about the container launch: (1) Via Cloudera Manager, set the following configuration to 600 (10 minutes): Localized Dir Deletion Delay. This will tell the Node Manager to delay 10 minutes before cleaning up the container launcher. This will help us review the files used in the failed container launch (2) Set the following YARN configuration: Enable Container Launch Debug Information. Check the box to enable it. This will allow you to collect extra container launch information in the "yarn logs -applicationId" output. (3) SAVE your changes and then Restart YARN service from CM (4) Run a test mapreduce job (pi for instance) (5) After it fails, run the following to collect the aggregated logs for the job: yarn logs -applicationId <app_id> NOTE: you can direct the output to a file so you can search in the file. (6) Look for "launch_container" in the output to find the launch information. I just ran through a test and a lot more details about how the command will be launched is available. I truly believe it will help us assess a cause so we can find a solution.
... View more
02-28-2020
04:38 PM
@Dombai_Gabor, I'm sorry to hear that... I think you mean that the OS won't boot; if so, let us know what happens and perhaps we can help. I'm not too familiar with debugging tactics of OS boot off hand, but others might be able to provide some insight.
... View more