Member since
05-09-2017
107
Posts
7
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2983 | 03-19-2020 01:30 PM | |
15370 | 11-27-2019 08:22 AM | |
8473 | 07-05-2019 08:21 AM | |
14860 | 09-25-2018 12:09 PM | |
5587 | 08-10-2018 07:46 AM |
05-15-2019
04:25 AM
Hello, Can i run HDFS disk balancer for multiple hosts simultaneously. I tried it and it throws an error which is misleading as already one iteration is in progress. 19/05/15 07:19:24 ERROR tools.DiskBalancerCLI: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.diskbalancer.DiskBalancerException): Disk Balancer is not enabled. Thank you, Des
... View more
Labels:
- Labels:
-
HDFS
04-14-2019
12:00 PM
I intermittently see these errors in keytrustee KMS. When i check both Keytrustee KMS servers are up and running without any issues. (Hadoopkey list, get metadata for key from each kms server yields fine without any errors) We see these in Pig jobs intermittently and job when we rerun succeeds. Can anyone throw light on this issue and next steps in troubleshooting. 2019-04-14 04:00:55,380 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2019-04-14 04:00:55,380 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address 2019-04-14 04:00:56,160 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2019-04-14 04:00:56,190 [JobControl] INFO org.apache.hadoop.hdfs.DFSClient - Created token for func_svc_dig_11ent: HDFS_DELEGATION_TOKEN owner=func_abc@VSP.SAS.COM, renewer=yarn, realUser=, issueDate=1555228856156, maxDate=1555836656176, sequenceNumber=96594554, masterKeyId=1976 on ha-hdfs:nameservice1 2019-04-14 04:00:56,367 [JobControl] WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:func_abc@VSP.SAS.COM (auth:KERBEROS) cause:org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Connection refused) 2019-04-14 04:00:56,370 [JobControl] WARN org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider - KMS provider at [https://cdn84au.xxx.xxx.com:16000/kms/v1/] threw an IOException: java.io.IOException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:1024) at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$1.call(LoadBalancingKMSClientProvider.java:193) at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$1.call(LoadBalancingKMSClientProvider.java:190) at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:123) at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.addDelegationTokens(LoadBalancingKMSClientProvider.java:190) at org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:110) at org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2333) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:140) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
... View more
Labels:
- Labels:
-
Cloudera Navigator
-
Kerberos
12-06-2018
01:33 PM
1 Kudo
Yes we only tried deleting the out-of-sync partition. It did not work. After a lot of research we came to a conclusion to increase replica.lag.time.max.ms to 8 days. As its been around 8 days that a few replicas were out of sync. This resolved our issue and while it took a few hours for followers to fetch and replicate the 7 days of data. https://www.confluent.io/blog/hands-free-kafka-replication-a-lesson-in-operational-simplicity/ helped to understand the ISR's
... View more
12-05-2018
07:44 AM
I agree with your suggestion and we are in the process of testing this in staging. We unfortunately dont want to try this on the problematic cluster which is production as we might corrupt something. its hard to replicate the issue in staging environment, atleast try to do rm -rf for one replica and restart broker and see how it would behave. After doing some research this is the issue we are facing. https://issues.apache.org/jira/browse/KAFKA-6361.
... View more
11-27-2018
07:45 AM
Is there a way that I can force the replica to catch up the leader? The replica has been out of sync for over 24 hrs. Tried restarting and i dont see any movement. Tried moving replica to a different brokers it does not work reassignment stuck . Created an additional replica and that command also stuck waiting for the out-of-sync to catch up to leader. Unclean.leader.election is enabled in the cluster Logs: ERROR kafka.server.ReplicaFetcherThread: [ReplicaFetcher replicaId=99, leaderId=157, fetcherId=0] Error due to kafka.common.KafkaException: Error processing data for partition dev-raw-events-35 offset 111478948 Caused by: kafka.common.UnexpectedAppendOffsetException: Unexpected offset in append to dev-raw-events-35. First offset or last offset of the first batch 111478933 is less than the next offset 111478948. First 10 offsets in append: List(111478933, 111478934, 111478935, 111478936, 111478937, 111478938, 111478939, 111478940, 111478941, 111478942), last offset in append: 111479224. Log start offset = 95104666 Tried restarting the broker and the under-replicated partitions change Tried moving to another node and it was uncessfull Tried creating a new replica and kafka-reassign-partitions is stuck waiting for the out of sync to catch up What can i do to fix this issue ?
... View more
Labels:
- Labels:
-
Apache Kafka
11-21-2018
08:05 AM
What i did was i backed up this directory and restarted the metadataserver as a workaround. it wil rebuild all indexes from scratch. Just verify your audits after restart. sudo mv /dcf/hdp/cloudera-scm-navigator /dcf/hdp/cloudera-scm-navigator.bkp Restart Navigator
... View more
11-18-2018
05:38 AM
Thank you @bgooley Thats what i did and it worked. thank you and appreciate your response.
... View more
11-16-2018
01:11 PM
Updating indexes every3600 seconds Location of indexes: /xxx/hadoop/cloudera-scm-headlamp Exception occurred at November 16, 2018 9:01:16 PM +00:00 java.io.IOException: No sub-file with id .fnm found (fileName=_1d.cfs files: []) at org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:155) at org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:144) at org.apache.lucene.index.FieldInfos.(FieldInfos.java:74) at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:73) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:115) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:93) at org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:113) at org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:29) at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:81) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:754) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:75) at org.apache.lucene.index.IndexReader.open(IndexReader.java:421) at org.apache.lucene.index.IndexReader.open(IndexReader.java:281) at org.apache.lucene.search.IndexSearcher.(IndexSearcher.java:89) at com.cloudera.headlamp.HeadlampServiceImpl.setData(HeadlampServiceImpl.java:210) at com.cloudera.headlamp.HeadlampIndex.loadNewSearchIndex(HeadlampIndex.java:220) at com.cloudera.headlamp.HeadlampIndex.(HeadlampIndex.java:101) at com.cloudera.headlamp.HeadlampIndexManager.getOrCreateIndex(HeadlampIndexManager.java:171) at com.cloudera.headlamp.HeadlampIndexManager.reindexIndexes(HeadlampIndexManager.java:235) at com.cloudera.headlamp.HeadlampIndexManager.access$100(HeadlampIndexManager.java:58) at com.cloudera.headlamp.HeadlampIndexManager$1.run(HeadlampIndexManager.java:492)
... View more
Labels:
- Labels:
-
Cloudera Manager
11-16-2018
07:25 AM
2 Kudos
This was an issue with that consumer group in __consumer_offsets adn these were the steps we did to fix this issue On a single broker run the below 1) find /kafka/data -name "*.log" | grep -i consumer | awk '{a=$1;b="kafka-run-class kafka.tools.DumpLogSegments --deep-iteration --print-data-log -files "a; print b}' Now run each and every command on xxxx broker to see which log file has consumer group "prod-abc-events" 2) kafka-run-class kafka.tools.DumpLogSegments --deep-iteration --print-data-log -files /kafka/data/sdc/__consumer_offsets-24/00000000000000000000.log | grep -i 'prod-abc-events Do steps above on all the brokers and make a list of all the files that have 'prod-abc-events' . In our instance we found 3 files that refrenced this group "prod-abc-events' broker1: /kafka/data/sda/__consumer_offsets-24/00000000000000000000.log broker2: /kafka/data/sdc/__consumer_offsets-24/00000000000000000000.log broker3: /kafka/data/sdc/__consumer_offsets-24/00000000000000000000.log We noticed that the .log file on broker1 was different in size and content from the remaining two. We backed up the file from broker1 and then replaced it with the one from broker2 . and that has resolved this issue. Most likely this happened to us when we ran kafka-reassign-partitions and drives reached 99% and then something broke in _consumer_offsets.
... View more