About desind

desind · ‎05-15-2019

Hello, Can i run HDFS disk balancer for multiple hosts simultaneously. I tried it and it throws an error which is misleading as already one iteration is in progress. 19/05/15 07:19:24 ERROR tools.DiskBalancerCLI: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.diskbalancer.DiskBalancerException): Disk Balancer is not enabled. Thank you, Des

desind · ‎04-14-2019

I intermittently see these errors in keytrustee KMS. When i check both Keytrustee KMS servers are up and running without any issues. (Hadoopkey list, get metadata for key from each kms server yields fine without any errors) We see these in Pig jobs intermittently and job when we rerun succeeds. Can anyone throw light on this issue and next steps in troubleshooting. 2019-04-14 04:00:55,380 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2019-04-14 04:00:55,380 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address 2019-04-14 04:00:56,160 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2019-04-14 04:00:56,190 [JobControl] INFO org.apache.hadoop.hdfs.DFSClient - Created token for func_svc_dig_11ent: HDFS_DELEGATION_TOKEN owner=func_abc@VSP.SAS.COM, renewer=yarn, realUser=, issueDate=1555228856156, maxDate=1555836656176, sequenceNumber=96594554, masterKeyId=1976 on ha-hdfs:nameservice1 2019-04-14 04:00:56,367 [JobControl] WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:func_abc@VSP.SAS.COM (auth:KERBEROS) cause:org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Connection refused) 2019-04-14 04:00:56,370 [JobControl] WARN org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider - KMS provider at [https://cdn84au.xxx.xxx.com:16000/kms/v1/] threw an IOException: java.io.IOException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:1024) at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$1.call(LoadBalancingKMSClientProvider.java:193) at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$1.call(LoadBalancingKMSClientProvider.java:190) at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:123) at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.addDelegationTokens(LoadBalancingKMSClientProvider.java:190) at org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:110) at org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2333) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:140) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)

desind · ‎12-06-2018

Yes we only tried deleting the out-of-sync partition. It did not work. After a lot of research we came to a conclusion to increase replica.lag.time.max.ms to 8 days. As its been around 8 days that a few replicas were out of sync. This resolved our issue and while it took a few hours for followers to fetch and replicate the 7 days of data. https://www.confluent.io/blog/hands-free-kafka-replication-a-lesson-in-operational-simplicity/ helped to understand the ISR's

desind · ‎12-05-2018

I agree with your suggestion and we are in the process of testing this in staging. We unfortunately dont want to try this on the problematic cluster which is production as we might corrupt something. its hard to replicate the issue in staging environment, atleast try to do rm -rf for one replica and restart broker and see how it would behave. After doing some research this is the issue we are facing. https://issues.apache.org/jira/browse/KAFKA-6361.

desind · ‎12-05-2018

I am using Apache Kafka , Version 1.1.1

desind · ‎11-27-2018

Is there a way that I can force the replica to catch up the leader? The replica has been out of sync for over 24 hrs. Tried restarting and i dont see any movement. Tried moving replica to a different brokers it does not work reassignment stuck . Created an additional replica and that command also stuck waiting for the out-of-sync to catch up to leader. Unclean.leader.election is enabled in the cluster Logs: ERROR kafka.server.ReplicaFetcherThread: [ReplicaFetcher replicaId=99, leaderId=157, fetcherId=0] Error due to kafka.common.KafkaException: Error processing data for partition dev-raw-events-35 offset 111478948 Caused by: kafka.common.UnexpectedAppendOffsetException: Unexpected offset in append to dev-raw-events-35. First offset or last offset of the first batch 111478933 is less than the next offset 111478948. First 10 offsets in append: List(111478933, 111478934, 111478935, 111478936, 111478937, 111478938, 111478939, 111478940, 111478941, 111478942), last offset in append: 111479224. Log start offset = 95104666 Tried restarting the broker and the under-replicated partitions change Tried moving to another node and it was uncessfull Tried creating a new replica and kafka-reassign-partitions is stuck waiting for the out of sync to catch up What can i do to fix this issue ?

desind · ‎11-21-2018

What i did was i backed up this directory and restarted the metadataserver as a workaround. it wil rebuild all indexes from scratch. Just verify your audits after restart. sudo mv /dcf/hdp/cloudera-scm-navigator /dcf/hdp/cloudera-scm-navigator.bkp Restart Navigator

desind · ‎11-18-2018

Thank you @bgooley Thats what i did and it worked. thank you and appreciate your response.

desind · ‎11-16-2018

Updating indexes every3600 seconds Location of indexes: /xxx/hadoop/cloudera-scm-headlamp Exception occurred at November 16, 2018 9:01:16 PM +00:00 java.io.IOException: No sub-file with id .fnm found (fileName=_1d.cfs files: []) at org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:155) at org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:144) at org.apache.lucene.index.FieldInfos.(FieldInfos.java:74) at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:73) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:115) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:93) at org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:113) at org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:29) at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:81) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:754) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:75) at org.apache.lucene.index.IndexReader.open(IndexReader.java:421) at org.apache.lucene.index.IndexReader.open(IndexReader.java:281) at org.apache.lucene.search.IndexSearcher.(IndexSearcher.java:89) at com.cloudera.headlamp.HeadlampServiceImpl.setData(HeadlampServiceImpl.java:210) at com.cloudera.headlamp.HeadlampIndex.loadNewSearchIndex(HeadlampIndex.java:220) at com.cloudera.headlamp.HeadlampIndex.(HeadlampIndex.java:101) at com.cloudera.headlamp.HeadlampIndexManager.getOrCreateIndex(HeadlampIndexManager.java:171) at com.cloudera.headlamp.HeadlampIndexManager.reindexIndexes(HeadlampIndexManager.java:235) at com.cloudera.headlamp.HeadlampIndexManager.access$100(HeadlampIndexManager.java:58) at com.cloudera.headlamp.HeadlampIndexManager$1.run(HeadlampIndexManager.java:492)

desind · ‎11-16-2018

This was an issue with that consumer group in __consumer_offsets adn these were the steps we did to fix this issue On a single broker run the below 1) find /kafka/data -name "*.log" | grep -i consumer | awk '{a=$1;b="kafka-run-class kafka.tools.DumpLogSegments --deep-iteration --print-data-log -files "a; print b}' Now run each and every command on xxxx broker to see which log file has consumer group "prod-abc-events" 2) kafka-run-class kafka.tools.DumpLogSegments --deep-iteration --print-data-log -files /kafka/data/sdc/__consumer_offsets-24/00000000000000000000.log | grep -i 'prod-abc-events Do steps above on all the brokers and make a list of all the files that have 'prod-abc-events' . In our instance we found 3 files that refrenced this group "prod-abc-events' broker1: /kafka/data/sda/__consumer_offsets-24/00000000000000000000.log broker2: /kafka/data/sdc/__consumer_offsets-24/00000000000000000000.log broker3: /kafka/data/sdc/__consumer_offsets-24/00000000000000000000.log We noticed that the .log file on broker1 was different in size and content from the remaining two. We backed up the file from broker1 and then replaced it with the one from broker2 . and that has resolved this issue. Most likely this happened to us when we ran kafka-reassign-partitions and drives reached 99% and then something broke in _consumer_offsets.

Online	Offline
Last Visited	‎05-21-2021 07:55 PM

Member Since	‎05-09-2017 07:45 PM
Last Visited	‎05-21-2021 07:55 PM
Posts	107
Kudos received	5

Cloudera Community

Re: Kafka S3 sink connector failing

Re: Reassignment of a replica across Kafka volumes...

Re: Cloudera manager embedded database fails to co...

Re: LDAP/AD authentication failed

Re: sqoop list-databases error

HDFS diskbalancer

Intermittent Keytrustee KMS WARN and job failures

Re: Kafka Replica out-of-sync for over 24 hrs

Re: Kafka Replica out-of-sync for over 24 hrs

Re: Kafka Replica out-of-sync for over 24 hrs

Kafka Replica out-of-sync for over 24 hrs

Re: Navigator Meta server not coming up

Re: Error in Headlamp Debug Server Status

Error in Headlamp Debug Server Status

Re: Error in kafka consumer