Created on 10-15-2019 07:04 AM - last edited on 10-15-2019 07:37 AM by ask_bill_brooks
Hi All,
When Kerberos and TLS is enabled on the CDH cluster, role logs are not reported in Cloudera Manager under the respective services. CM reports 'No log messages at the specified URL' for each of the following:
Interestingly, the log files mentioned on the log pages (on CM) exist on the underlined host and contain all & correct log entries; They just don't get reported at CM service log pages. When tried downloading full log files, it throws error:
HTTP ERROR 403Problem accessing /cmf/process/335/logs. Reason: Unexpected end of file from server The server declined access to the page or resource.
|
Same error is observed when attempted fetching logs using CM API.
Environment:
CDH | 6.2 |
OS | Redhat 7.7 |
Created 11-28-2019 09:28 AM
@SandeepSingh This looks like the issue with TLS.
Eventhough the flag 'Use TLS Authentication of Agents to Server' in CM WebUI is not set, the following flag must be set for status_server to use TLS protocol using port 9000. Go to the /opt/cloudera/security/x509/ directory and use 'pem' and 'key' file under that directory. You may also have to use the password file for the private key if there is one.
Then edit the /etc/cloudera-scm-agent/config.ini file with below parameters.
# PEM file containing client private key.
client_key_file=
# If client_keypw_cmd isn't specified, instead a text file containing the client private key password can be used.
client_keypw_file=
# PEM file containing client certificate.
client_cert_file=/etc/cdep-ssl-conf/CA_STANDARD/cm_server-cert.pem
verify_cert_file=
Restart of the status_server is required
cd /var/run/cloudera-scm-agent/supervisord
/opt/cloudera/cm-agent/bin/supervisorctl -c /var/run/cloudera-scm-agent/supervisor/supervisord.conf restart status_server
In addition, restart of the cloudera-scm-agent is also needed
service cloudera-scm-agent restart
Created 03-11-2020 09:26 AM
@GangWar Thanks for your suggestion.
All the parameters except the following one were already set in /etc/cloudera-scm-agent/config.ini
verify_cert_file
Apparently, the only reason why agent wasn't serving requests for logs was because the above flat wasn't set.
The moment we configured the flag verify_cert_file and restarted agent, it started serving logs correctly.
Created 10-16-2019 03:49 AM
Hi Sandeep,
Can you please check the owner and permissions of "/var/run/cloudera-scm-server" properties whether it is root or cloudera-scm ??
Thanks and Regards,
Bhuvan.
Created 10-16-2019 04:09 AM
Hi @Bhuv
Here are the owner/permission details:
drwxr-xr-x. 2 cloudera-scm cloudera-scm 40 Sep 24 12:05 cloudera-scm-server
Regards,
Sandeep
Created on 10-17-2019 11:14 AM - edited 10-17-2019 11:16 AM
@SandeepSingh If the same error is also can be observed from the API then this is most probably the issue with TLS. However could you please try to narrow down the issue by following methods:
/opt/cloudera/cm-agent/bin/supervisorctl -c /var/run/cloudera-scm-agent/supervisor/supervisord.conf restart status_server
Please let us know how this goes. If the issue remains then please run the Host Inspector and share the result.
Created 10-17-2019 02:09 PM
Hi @SandeepSingh ,
Could you please share your Cloudera Manager version? I see you mentioned CDH version is 6.2. What about CM version? There was a bug in CM 6.1 where under certain condition the CM Server fails to contact agent when TLS is on. I am wondering if you are hitting that bug.
Thanks,
Li
Li Wang, Technical Solution Manager
Created 10-18-2019 09:00 AM
Created 10-18-2019 09:14 AM
@SandeepSingh Have you tried to run below command in problematic host as stated earlier?
/opt/cloudera/cm-agent/bin/supervisorctl -c /var/run/cloudera-scm-agent/supervisor/supervisord.conf restart status_server
This bug has been fixed in the 6.2. I may be possible the status server did not being restarted after making the changes. Please run the command and also follow the other steps stated in previous post and let us know how this goes.
Created 10-23-2019 05:49 AM
Hi @GangWar
I tried the both of the following command; however result was exactly the same.
/opt/cloudera/cm-agent/bin/supervisorctl -c /var/run/cloudera-scm-agent/supervisor/supervisord.conf restart status_server
I have indeed Enable Kerberos Authentication for HTTP Web-Consoles option checked-in and I already tried configuring SPNEGO in the browsers (Mozilla Firefox as well Chrome) as stated in Cloudera documentation; no luck.
Host Inspector also failed to complete on all hosts of the cluster. Error Details:
IOException thrown while collecting data from host: Read timed out
stdout
"etcKrbConfMessages" : [ ], "extantInitdErrors" : [ ], "groupData" : "hdfs:x:11004:hive,impala\nmapred:x:11014:\nzookeeper:x:11022:\noozie:x:11015:\nhbase:x:11003:\nhue:x:11007:\ncloudera-scm:x:991:\nhadoop:x:11002:hdfs,hive,impala,mapred,yarn\nhive:x:11005:impala\nsqoop:x:11019:\nimpala:x:11008:\nyarn:x:11021:\nhttpfs:x:11006:\nsentry:x:11018:\nsolr:x:11016:\nspark:x:11017:\nkms:x:11010:\n", "hostDnsErrors" : [ ], "hostname" : "dwz-06.nonprod.lan", "jceStrength" : 0, "kdcConnectionMessages" : [ ], "kernelVersion" : "3.10.0-1062.1.2.el7.x86_64", "kernelVersionException" : null, "localHostIpError" : null, "localhostIp" : "127.0.0.1", "nowMillis" : 1571833401403, "psycopg2Version" : "2.7.5", "psycopg2VersionException" : null, "psycopg2VersionOk" : true, "pythonVersionException" : null, "pythonVersionOk" : true, "pythonVersionString" : "2.7", "rhelRelease" : "Red Hat Enterprise Linux Server release 7.7 (Maipo)", "runExceptions" : [ ], "swappiness" : "60", "swappinessException" : null, "timeZone" : "UTC+01:00", "transparentHugePagesDefrag" : "[always] madvise never", "transparentHugePagesEnabled" : "always madvise [never]", "transparentHugePagesException" : null, "transparentHugePagesPath" : "/sys/kernel/mm/transparent_hugepage", "userData" : "hdfs:x:980:11004::/home/hdfs:/bin/bash\nmapred:x:977:11014::/home/mapred:/bin/bash\nzookeeper:x:981:11022::/home/zookeeper:/bin/bash\noozie:x:986:11015::/home/oozie:/bin/bash\nhbase:x:994:11003::/home/hbase:/bin/bash\nhue:x:992:11007::/home/hue:/bin/bash\ncloudera-scm:x:974:991:Cloudera Manager:/var/lib/cloudera-scm-server:/sbin/nologin\nsqoop:x:982:11019::/home/sqoop:/bin/bash\nsqoop2:x:976:11019::/home/sqoop2:/bin/bash\nimpala:x:978:11008::/home/impala:/bin/bash\nyarn:x:975:11021::/home/yarn:/bin/bash\nhttpfs:x:993:11006::/home/httpfs:/bin/bash\nsentry:x:983:11018::/home/sentry:/bin/bash\nsolr:x:985:11016::/home/solr:/bin/bash\nspark:x:984:11017::/home/spark:/bin/bash\nkms:x:990:11010::/home/kms:/bin/bash\n" }
stderr
+ [[ inspector == \f\i\r\e\h\o\s\e ]] + [[ inspector == \e\v\e\n\t\s\e\r\v\e\r ]] + [[ inspector == \a\l\e\r\t\p\u\b\l\i\s\h\e\r ]] + [[ inspector == \h\e\a\d\l\a\m\p ]] + [[ inspector == \t\e\l\e\m\e\t\r\y\p\u\b\l\i\s\h\e\r ]] + [[ inspector == \t\e\s\t\-\d\b\u\s\-\c\o\n\n\e\c\t\i\o\n ]] + [[ inspector == \i\n\s\p\e\c\t\o\r ]] + shift ++ pwd + MGMT_CLASSPATH='/run/cloudera-scm-agent/process/86-host-inspector:/usr/share/java/mysql-connector-java.jar:/opt/cloudera/cm/lib/postgresql-42.1.4.jre7.jar:/usr/share/java/oracle-connector-java.jar:/opt/cloudera/cm/lib/*:' + echo_and_exec /usr/java/jdk1.8.0_131/bin/java -server -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -Dmgmt.log.file= -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -cp '/run/cloudera-scm-agent/process/86-host-inspector:/usr/share/java/mysql-connector-java.jar:/opt/cloudera/cm/lib/postgresql-42.1.4.jre7.jar:/usr/share/java/oracle-connector-java.jar:/opt/cloudera/cm/lib/*:' com.cloudera.cmf.inspector.Inspector input.json output.json DEFAULT + echo 'Executing: /usr/java/jdk1.8.0_131/bin/java' -server -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -Dmgmt.log.file= -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -cp '/run/cloudera-scm-agent/process/86-host-inspector:/usr/share/java/mysql-connector-java.jar:/opt/cloudera/cm/lib/postgresql-42.1.4.jre7.jar:/usr/share/java/oracle-connector-java.jar:/opt/cloudera/cm/lib/*:' com.cloudera.cmf.inspector.Inspector input.json output.json DEFAULT + exec /usr/java/jdk1.8.0_131/bin/java -server -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -Dmgmt.log.file= -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -cp '/run/cloudera-scm-agent/process/86-host-inspector:/usr/share/java/mysql-connector-java.jar:/opt/cloudera/cm/lib/postgresql-42.1.4.jre7.jar:/usr/share/java/oracle-connector-java.jar:/opt/cloudera/cm/lib/*:' com.cloudera.cmf.inspector.Inspector input.json output.json DEFAULT
Here is what CM shows on Host Inspection result page:
Inspector failed on the following hosts... View Details
| |
The inspector failed to run on all hosts. | |
0 hosts are running CDH 5 and 7 hosts are running CDH 6. | |
All checked hosts in each cluster are running the same version of components. | |
All managed hosts have consistent versions of Java. | |
All checked Cloudera Management Daemons versions are consistent with the server. | |
All checked Cloudera Management Agents versions are consistent with the server. |
All HostsComponent Version Hosts Release CDH Version
dwz-[01-07].nonprod.lan | ||||
Supervisord | 3.0 | All Hosts | Unavailable | Not applicable |
Cloudera Manager Agent | 6.2.0 | All Hosts | 968826.el7 | Not applicable |
Cloudera Manager Management Daemons | 6.2.0 | All Hosts | 968826.el7 | Not applicable |
Flume NG | 1.9.0+cdh6.2.0 | All Hosts | 967373 | CDH 6 |
Hadoop | 3.0.0+cdh6.2.0 | All Hosts | 967373 | CDH 6 |
HDFS | 3.0.0+cdh6.2.0 | All Hosts | 967373 | CDH 6 |
HttpFS | 3.0.0+cdh6.2.0 | All Hosts | 967373 | CDH 6 |
hadoop-kms | 3.0.0+cdh6.2.0 | All Hosts | 967373 | CDH 6 |
MapReduce 2 | 3.0.0+cdh6.2.0 | All Hosts | 967373 | CDH 6 |
YARN | 3.0.0+cdh6.2.0 | All Hosts | 967373 | CDH 6 |
HBase | 2.1.0+cdh6.2.0 | All Hosts | 967373 | CDH 6 |
Lily HBase Indexer | 1.5+cdh6.2.0 | All Hosts | 967373 | CDH 6 |
Hive | 2.1.1+cdh6.2.0 | All Hosts | 967373 | CDH 6 |
HCatalog | 2.1.1+cdh6.2.0 | All Hosts | 967373 | CDH 6 |
Hue | 4.2.0+cdh6.2.0 | All Hosts | 967373 | CDH 6 |
Impala | 3.2.0+cdh6.2.0 | All Hosts | 967373 | CDH 6 |
Java 8 | 1.8.0_131 | All Hosts | Unavailable | Not applicable |
Kafka | 2.1.0+cdh6.2.0 | All Hosts | 967373 | CDH 6 |
Kite | 1.0.0+cdh6.2.0 | All Hosts | 967373 | CDH 6 |
kudu | 1.9.0+cdh6.2.0 | All Hosts | 967373 | CDH 6 |
Oozie | 5.1.0+cdh6.2.0 | All Hosts | 967373 | CDH 6 |
Parquet | 1.9.0+cdh6.2.0 | All Hosts | 967373 | CDH 6 |
Pig | 0.17.0+cdh6.2.0 | All Hosts | 967373 | CDH 6 |
sentry | 2.1.0+cdh6.2.0 | All Hosts | 967373 | CDH 6 |
Solr | 7.4.0+cdh6.2.0 | All Hosts | 967373 | CDH 6 |
spark | 2.4.0+cdh6.2.0 | All Hosts | 967373 | CDH 6 |
Sqoop | 1.4.7+cdh6.2.0 | All Hosts | 967373 | CDH 6 |
Zookeeper | 3.4.5+cdh6.2.0 | All Hosts | 967373 | CDH 6 |
Regards,
Sandeep Singh
Created 10-24-2019 11:32 AM
@SandeepSingh A quick thing I noticed is that the RHEL7.7 is not supported for the CDH6.2:
Created 10-29-2019 09:28 AM
Indeed I'm using rhel 7.7.Same behavior is observed on rhel 7.6 as well.
I have performed some tests and everything works smoothly when not enabled SSL on both OS releases (7.6 as well as 7.7).
I would guess, setting Enable Kerberos Authentication for HTTP Web-Consoles would kick in only for web-console's like 'Name Node UI etc' and shouldn't have any impact on CM's ability to read service logs from their respective hosts.
Regards
Sandeep Singh