About desind

desind · ‎12-06-2018

Yes we only tried deleting the out-of-sync partition. It did not work. After a lot of research we came to a conclusion to increase replica.lag.time.max.ms to 8 days. As its been around 8 days that a few replicas were out of sync. This resolved our issue and while it took a few hours for followers to fetch and replicate the 7 days of data. https://www.confluent.io/blog/hands-free-kafka-replication-a-lesson-in-operational-simplicity/ helped to understand the ISR's

desind · ‎11-18-2018

Thank you @bgooley Thats what i did and it worked. thank you and appreciate your response.

desind · ‎11-16-2018

This was an issue with that consumer group in __consumer_offsets adn these were the steps we did to fix this issue On a single broker run the below 1) find /kafka/data -name "*.log" | grep -i consumer | awk '{a=$1;b="kafka-run-class kafka.tools.DumpLogSegments --deep-iteration --print-data-log -files "a; print b}' Now run each and every command on xxxx broker to see which log file has consumer group "prod-abc-events" 2) kafka-run-class kafka.tools.DumpLogSegments --deep-iteration --print-data-log -files /kafka/data/sdc/__consumer_offsets-24/00000000000000000000.log | grep -i 'prod-abc-events Do steps above on all the brokers and make a list of all the files that have 'prod-abc-events' . In our instance we found 3 files that refrenced this group "prod-abc-events' broker1: /kafka/data/sda/__consumer_offsets-24/00000000000000000000.log broker2: /kafka/data/sdc/__consumer_offsets-24/00000000000000000000.log broker3: /kafka/data/sdc/__consumer_offsets-24/00000000000000000000.log We noticed that the .log file on broker1 was different in size and content from the remaining two. We backed up the file from broker1 and then replaced it with the one from broker2 . and that has resolved this issue. Most likely this happened to us when we ran kafka-reassign-partitions and drives reached 99% and then something broke in _consumer_offsets.

balusu · ‎11-16-2018

var/lib/kms-keytrustee/keytrustee/.keytrustee folder on both the kms hosts should match and unfortunately they do not in our cluster, So if a key create request goes to one kms host and retrieval goes to another kms host the command fails. [root@host]# md5sum /var/lib/kms-keytrustee/keytrustee/.keytrustee/secring.gpg fec74c82e3da7f04f2acd36a937072b5 /var/lib/kms-keytrustee/keytrustee/.keytrustee/secring.gpg [root@host]# md5sum /var/lib/kms-keytrustee/keytrustee/.keytrustee/secring.gpg 88483e6a8ee1d245d3c83b740fd43683 /var/lib/kms-keytrustee/keytrustee/.keytrustee/secring.gpg Used bdr tool to take a back up of encrypted zones in the same cluster, purged all keys, dropped all zones. Used rsync to sync /var/lib/kms-keytrustee/keytrustee/.keytrustee on both kms hosts, created all keys, zones and used bdr to restore the data from backup. Everything looks good now!!

bgooley · ‎10-12-2018

@desind, If none of your clients is breaking and everything looks healthy in Cloudera Manager, then it may not be necessary to dig deeper at this time. If you do want to, you could do a tcpdump on port 7183 on your CM host... let it run for a bit then read it in WireShark to try to track down which SSL handshakes are failing and what the client is.

desind · ‎09-25-2018

I was able to resolve this issue by moving the user and group under one OU. I think most likely it cannot do a backward search. Thank you @bgooley much appreciated. I learnt a few things in this process.

bgooley · ‎09-18-2018

@desind, There are a few ways to enable DEBUG or TRACE depending on what sort of problem you are attempting to troubleshoot. (1) If CM won't start or if there is a problem where you do not have an idea what classes are involved, you can enable DEBUG or TRACE for the whole server. Warning: this can be very very verbose, so it is likely going to be difficult to capture an event. - Edit /usr/sbin/cmf-server in CM 5--- Edit /opt/cloudera/cm/bin/cm-server in CM 6 - Change this: export CMF_ROOT_LOGGER="INFO,LOGFILE" to export CMF_ROOT_LOGGER="DEBUG,LOGFILE" Restart CM to have the change apply. (2) If you know what class or package you want to DEBUG, you can edit /etc/cloudera-scm-server/log4j.properties: Add lines as follows... this is an example of turning on debugging for just ldap classes in SpringFramework (used in LDAP authentication): log4j.logger.org.springframework.ldap=TRACE log4j.logger.org.springframework.security.ldap=TRACE Restart CM to have the changes apply (3) If you want to turn on some debug or trace level logging for just the current session of Cloudera Manager, you can use the debug page: https://cm_host:cm_port/cmf/debug/logLevel - Choose the Logger from the drop-down - Select the level to which you want to change the logging - Click "Submit Query" button to apply The log level you selected will only apply until you restart Cloudera Manager (4) API debugging. You can enable API debugging in the Cloudera Manager interface: - Navigate to: Administration --> Settings - Search for Enable Debugging of API - Check the box next to it and Save API debugging will be written to the /var/log/cloudra-scm-server/cloudera-scm-server.log file without restart. (5) NOTE: If you do enable verbose debugging, you may need to increase the size of log files or the number to be able to review relevant lines. To do so, I believe you can simply edit the following in /etc/cloudera-scm-server/log4j.properties: log4j.appender.LOGFILE.MaxFileSize=10MB log4j.appender.LOGFILE.MaxBackupIndex=10

bgooley · ‎09-14-2018

@desind, No limit that I know of on the CM side. Please start a new thread and provide your LDAP configuration, what happens in the logs and also the "abc_efg_scd_dfc" user LDIF entry. There are lots of reasons for failures, so it is important we start with what you observe and the items involved.

desind · ‎07-27-2018

I am seeing similar issue with ServiceMonitor and Host monitor when using Redhat 6.8 (Santiago) CM/CDH is 5.11.1 After adding JAVA_TOOL_OPTIONS=-Xss2m to hostmonitor and service monitor configuration is works fine. Is this a known issue with Redhat 6.7 as well ? (The link you mentioned is centos and its 6.9)

omaritec · ‎07-18-2018

Ok I understand your point but what if mappers are failing ? Yarn already sets up as many mappers as files number, should I increase this more ? Since only a minority of my jobs are failing, how can I tune yarn to use more mappers for these particular jobs?

Online	Offline
Last Visited	‎05-21-2021 07:55 PM

Member Since	‎05-09-2017 07:45 PM
Last Visited	‎05-21-2021 07:55 PM
Posts	107
Kudos received	5

Cloudera Community

Re: Kafka S3 sink connector failing

Re: Reassignment of a replica across Kafka volumes...

Re: Cloudera manager embedded database fails to co...

Re: LDAP/AD authentication failed

Re: sqoop list-databases error

Re: Kafka Replica out-of-sync for over 24 hrs

Re: Error in Headlamp Debug Server Status

Re: Error in kafka consumer

Re: No KeyVersion exists for key 'testTLS1'

Re: avax.net.ssl.SSLException: Received fatal aler...

Re: LDAP/AD authentication failed

Re: How to enable debug logging for Cloudera Manag...

Re: com.cloudera.server.web.cmf.CmfLdapAuthenticat...

Re: Impala cannot start SIGBUS crash

Re: Map and Reduce Error: Java heap space