Member since 
    
	
		
		
		08-08-2017
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                1652
            
            
                Posts
            
        
                30
            
            
                Kudos Received
            
        
                11
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 1910 | 06-15-2020 05:23 AM | |
| 15411 | 01-30-2020 08:04 PM | |
| 2045 | 07-07-2019 09:06 PM | |
| 8090 | 01-27-2018 10:17 PM | |
| 4555 | 12-31-2017 10:12 PM | 
			
    
	
		
		
		02-26-2023
	
		
		10:29 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello Mike     Likely your existing 3 Zookeeper nodes can serve your expansion requirements  You can monitor the CPU and network of the Zookeeper nodes when your Kafka cluster is growing, when reaching the throughput limit, you can expand your zookeeper to 5 nodes  Remember the zookeeper nodes need to keep in sync all the time, so the more zookeeper nodes the more traffic will be added to keep them in sync, while those nodes handling the Kafka requests; so it doesn't mean the more the better     I would suggest to stay with 3 zookeeper nodes while expanding your kafka cluster with close monitoring, and consider to grow to 5 when the CPU/network throughput reaching the limit     You can also consider to tune the zookeeper nodes e.g. dedicated disks, better network throughput, isolate zookeeper process, disable swaps  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-22-2023
	
		
		08:39 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 we have HDP cluster version 2.6.5     when we look on name-node logs we can see the following warning     2023-02-20 15:58:31,377 WARN  server.Journal (Journal.java:journal(398)) - Sync of transaction     2023-02-20 16:00:39,037 WARN  server.Journal (Journal.java:journal(398)) - Sync of transaction 
2023-02-20 16:01:43,962 WARN  server.Journal (Journal.java:journal(398)) - Sync of transaction range 193594954980-193594954980 took 1329ms
2023-02-20 16:02:47,129 WARN  server.Journal (Journal.java:journal(398)) - Sync of transaction range 193595018764-193595018764 took 1321ms
2023-02-20 16:03:52,763 WARN  server.Journal (Journal.java:journal(398)) - Sync of transaction range 193595106645-193595106646 took 1344ms
2023-02-20 16:04:56,276 WARN  server.Journal (Journal.java:journal(398)) - Sync of transaction range 193595175233-193595175233 took 1678ms
2023-02-20 16:06:01,067 WARN  server.Journal (Journal.java:journal(398)) - Sync of transaction range 193595252052-193595252052 took 1265ms
2023-02-20 16:07:06,447 WARN  server.Journal (Journal.java:journal(398)) - Sync of transaction range 193595320796-193595320796 took 1273ms           in our HDP cluster , HDFS service include 2 name-node services and 3 journal-Nodes cluster include 736 data nodes machines , and HDFS service is the manager of all data-node     we want to understand what is the reason for the following warning ?     and how to avoid this messages by proactive solution  server.Journal (Journal.java:journal(398)) - Sync of transaction range 193595018764-193595018764 took 1321ms       
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Ambari Blueprints
			
    
	
		
		
		01-24-2023
	
		
		09:23 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 we have HDP cluster version - 2.6.5 with Ambari platform     here is example from our Ambari lab cluster with 5 mode managers machines            regarding to YARN service - is it possible to add in Ambari the widget that can show CPU core  consuming ?     if not what are the other ways to find the CORE consuming by YARN from cli ?    other way that we found is from the `resource_manager:8088/cluster`  as the following         so is it possible to find some API / CLI that can capture the VCores Used ?    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Ambari Blueprints
			
    
	
		
		
		01-24-2023
	
		
		08:41 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 we have spark production cluster with YARN service ( based on HDP 2.6.5 version )     total node-managers services are - 745 ( actually 745 Linux machines )     and yarn active resource-manager and standby resourcemanager are installed on different masters machines     we found that the following parameters not defined in our YARN configuration ( yarn-site.xml ) !  yarn.scheduler.increment-allocation-vcores
yarn.scheduler.increment-allocation-mb  and above parameters not defined not in Ambari and not in YARN XML configuration files!     I want to know what the meaning of the parameter - yarn.scheduler.increment-allocation-vcores ?     and what is the affect if this parameters are not defined in our configuration?     from YARN best practice configuration we are understanding that both parameters are part of YARN configuration , but we not sure if we must to add them to YARN custom configuration     from documentation we found:  Minimum and maximum allocation unit in YARN   Two resources—memory and CPU, as of in Hadoop 2.5.1, have minimum and maximum allocation unit in YARN, as set by the configurations in yarn-site.xml.  Basically, it means RM can only allocate memory to containers in increments of “yarn.scheduler.minimum-allocation-mb” and not exceed “yarn.scheduler.maximum-allocation-mb”  It can only allocate CPU vcores to containers in increments of “yarn.scheduler.minimum-allocation-vcores” and not exceed “yarn.scheduler.maximum-allocation-vcores”  If changes required, set above configurations in yarn-site.xml on RM nodes, and restart RM.      reference:     https://docs.trifacta.com/display/r076/Tune+Cluster+Performance     https://stackoverflow.com/questions/58522138/how-to-control-yarn-container-allocation-increment-properly     https://pratikbarjatya.github.io/learning/best-practices-for-yarn-resource-management/     https://stackoverflow.com/questions/58522138/how-to-control-yarn-container-allocation-increment-properly           
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Ambari Blueprints
			
    
	
		
		
		01-12-2023
	
		
		12:09 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I want to say also that node-manager restart or fully restart of yarn service fixed the problem , but as you know this isn't the right solution that should be every time that one of the node manager became die  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-23-2022
	
		
		11:10 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I have the same problem, but in my case JVM pause detections are happening every 15 mins and detecting between 28337ms, .. 853466ms pauses. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-09-2022
	
		
		05:24 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 The following areas normally cause this problem:    1) the connection from Ambari agent host to Ambari Server got lost.  2) firewall issue blocked connections.  3) hostname and IP address are not being set correctly in /etc/hosts     You can compare the output using these APIs:    > curl u user:paswd http://AmbariHost:8080/api/v1/hosts 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-29-2022
	
		
		08:07 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 we have 3 Kafka brokers machines on on RHEL 7.9 Linux version ,     ( when each machine is physical strong `DELL HW` - memory = 512G and 96 CORE CPU )     Kafka cluster is in production mode        Kafka version is 2.7.x , and Kafka disks are in Jbod configuration  each Kafka broker has 8 Jbod disks , as we can see from the following ( df -h details )    df -h    /dev/sdc 1.7T 929G 748G 56% /kafka/kafka_logs2  /dev/sdd 1.7T 950G 727G 57% /kafka/kafka_logs3  /dev/sde 1.7T 999G 678G 60% /kafka/kafka_logs4  /dev/sdf 1.7T 971G 706G 58% /kafka/kafka_logs5  /dev/sdg 1.7T 1.7T 20K 100% /kafka/kafka-logs6 <-----------------  /dev/sdh 1.7T 962G 714G 58% /kafka/kafka_logs7  /dev/sdi 1.7T 1.1T 621G 63% /kafka/kafka_logs8        as we can see from above disk - `/kafka/kafka_logs6` get `100%` used    after short investigation we found that Kafka broker isn't tolerant when one disk is failed or disk reached to 100% , as results of this Kafka broker now is down  here the Kafka `server.log`        [2022-11-29 15:43:59,723] ERROR Error while writing to checkpoint file /kafka/kafka-logs6 .............  java.io.IOException: No space left on device  at java.io.FileOutputStream.writeBytes(Native Method)  at java.io.FileOutputStream.write(FileOutputStream.java:326)  at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)  at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)  at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)  at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)  at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)  at java.io.BufferedWriter.flush(BufferedWriter.java:254)  at kafka.server.checkpoints.CheckpointFile.liftedTree1$1(CheckpointFile.scala:108)  at kafka.server.checkpoints.CheckpointFile.write(CheckpointFile.scala:92)  at kafka.server.checkpoints.LeaderEpochCheckpointFile.write(LeaderEpochCheckpointFile.scala:70)  at kafka.server.epoch.LeaderEpochFileCache.flush(LeaderEpochFileCache.scala:292)  at kafka.server.epoch.LeaderEpochFileCache.$anonfun$truncateFromEnd$1(LeaderEpochFileCache.scala:238)  at kafka.server.epoch.LeaderEpochFileCache.truncateFromEnd(LeaderEpochFileCache.scala:235)  at kafka.log.Log.$anonfun$new$1(Log.scala:305)  at kafka.log.Log.<init>(Log.scala:305)  at kafka.log.Log$.apply(Log.scala:2549)  at kafka.log.LogManager.loadLog(LogManager.scala:273)  at kafka.log.LogManager.$anonfun$loadLogs$12(LogManager.scala:352)  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)  at java.util.concurrent.FutureTask.run(FutureTask.java:266)  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)  at java.lang.Thread.run(Thread.java:750)  **from my perspective** when we have **8 disks** on each broker and one disks is failed ( like reached 100% ) , then we expect Kafka broker still be alive even one disk is failed  so as results of above scenario we searched in Kafka `server.properties` the parameter that can help us to configure the Kafka broker to be tolerant when one disk is failed  but we not found it , or maybe we not know what to set in order to define the kafka broker to be tolerant with one disk failure    the full parameters are:    more server.properties  auto.create.topics.enable=false  auto.leader.rebalance.enable=true  background.threads=10  log.retention.bytes=-1  log.retention.hours=48  delete.topic.enable=true  leader.imbalance.check.interval.seconds=300  leader.imbalance.per.broker.percentage=10  log.dir=/kafka/kafka-logs2,/kafka/kafka-logs3 ...............  log.flush.interval.messages=9223372036854775807  log.flush.interval.ms=1000  log.flush.offset.checkpoint.interval.ms=60000  log.flush.scheduler.interval.ms=9223372036854775807  log.flush.start.offset.checkpoint.interval.ms=60000  compression.type=producer  log.roll.jitter.hours=0  log.segment.bytes=1073741824  log.segment.delete.delay.ms=60000  message.max.bytes=1000012  min.insync.replicas=1  num.io.threads=10  num.network.threads=48  num.recovery.threads.per.data.dir=1  num.replica.fetchers=1  offset.metadata.max.bytes=4096  offsets.commit.required.acks=-1  offsets.commit.timeout.ms=5000  offsets.load.buffer.size=5242880  offsets.retention.check.interval.ms=600000  offsets.retention.minutes=10080  offsets.topic.compression.codec=0  offsets.topic.num.partitions=50  offsets.topic.replication.factor=3  offsets.topic.segment.bytes=104857600  queued.max.requests=1000  quota.consumer.default=9223372036854775807  quota.producer.default=9223372036854775807  replica.fetch.min.bytes=1  replica.fetch.wait.max.ms=500  replica.high.watermark.checkpoint.interval.ms=5000  replica.lag.time.max.ms=10000  replica.socket.receive.buffer.bytes=65536  replica.socket.timeout.ms=30000  request.timeout.ms=30000  socket.receive.buffer.bytes=102400  socket.request.max.bytes=104857600  socket.send.buffer.bytes=102400  transaction.max.timeout.ms=900000  transaction.state.log.load.buffer.size=5242880  transaction.state.log.min.isr=2  transaction.state.log.num.partitions=50  transaction.state.log.replication.factor=3  transaction.state.log.segment.bytes=104857600  transactional.id.expiration.ms=604800000  unclean.leader.election.enable=false  zookeeper.connection.timeout.ms=600000  zookeeper.max.in.flight.requests=10  zookeeper.session.timeout.ms=600000  zookeeper.set.acl=false  broker.id.generation.enable=true  connections.max.idle.ms=600000  connections.max.reauth.ms=0  controlled.shutdown.enable=true  controlled.shutdown.max.retries=3  controlled.shutdown.retry.backoff.ms=5000  controller.socket.timeout.ms=30000  default.replication.factor=3  delegation.token.expiry.time.ms=86400000  delegation.token.max.lifetime.ms=604800000  delete.records.purgatory.purge.interval.requests=1  fetch.purgatory.purge.interval.requests=1000  group.initial.rebalance.delay.ms=3000  group.max.session.timeout.ms=1800000  group.max.size=2147483647  group.min.session.timeout.ms=6000  log.cleaner.backoff.ms=15000  log.cleaner.dedupe.buffer.size=134217728  log.cleaner.delete.retention.ms=86400000  log.cleaner.enable=true  log.cleaner.io.buffer.load.factor=0.9  log.cleaner.io.buffer.size=524288  log.cleaner.io.max.bytes.per.second=1.7976931348623157e308  log.cleaner.max.compaction.lag.ms=9223372036854775807  log.cleaner.min.cleanable.ratio=0.5  log.cleaner.min.compaction.lag.ms=0  log.cleaner.threads=1  log.cleanup.policy=delete  log.index.interval.bytes=4096  log.index.size.max.bytes=10485760  log.message.timestamp.difference.max.ms=9223372036854775807  log.message.timestamp.type=CreateTime  log.preallocate=false  log.retention.check.interval.ms=300000  max.connections=2147483647  max.connections.per.ip=2147483647  max.incremental.fetch.session.cache.slots=1000  num.partitions=1  producer.purgatory.purge.interval.requests=1000  queued.max.request.bytes=-1  replica.fetch.backoff.ms=1000  replica.fetch.max.bytes=1048576  replica.fetch.response.max.bytes=10485760  reserved.broker.max.id=1500  transaction.abort.timed.out.transaction.cleanup.interval.ms=60000  transaction.remove.expired.transaction.cleanup.interval.ms=3600000  zookeeper.sync.time.ms=2000  broker.rack=/default-rack     I want to add my personal feeling:  *just to gives here the absurder of the above scenario  lets say we have on each Kafka broker 100 disks ( in Jbod )  is it make sense that Kafka broker will be shutdown just because one disk is failed ?* 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Kafka
			
    
	
		
		
		11-16-2022
	
		
		09:28 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 we have in our Hadoop cluster 3 Kafka machines,  Kafka machines include the following services  on kafka01:  confluent kafka service  confluent schema registry service  zookeeper service    on kafka02:  confluent kafka service  confluent schema registry service  zookeeper service    on kafka03:  confluent kafka service  confluent schema registry service  zookeeper service  on our Kafka cluster , we have different 34 topics  one of the topic is topic name - `car.to.go`  this topic include 3 replica in Kafka cluster  what we found regarding to schema registry service , is that topic - `car.to.go` have different versions on kafka03 from other machines - `kafka01/02`  so actually we get that versions are not in sync  here is example:    [root@kafka01 ~]# curl -X GET http://kafka01:8081/subjects/car.to.go-value/versions  [1,2,3,4,5,6,7]  [root@kafka01 ~]# curl -X GET http://kafka02:8081/subjects/car.to.go-value/versions  [1,2,3,4,5,6,7]  [root@kafka01 ~]# curl -X GET http://kafka03:8081/subjects/car.to.go-value/versions  [1,2,3,4]  from above info what could be the resewn that versions on kafka03 is diff from kafka01/02  regarding to topic `car.to.go` - what is the right way to fix this issue in way that we get on kafka03 the number versions as - `1,2,3,4,5,6,7`  notes:  all Kafka services and schema registry services and zookeeper services are up and running     links:  https://github.com/confluentinc/schema-registry/blob/master/README.md  https://docs.confluent.io/platform/current/schema-registry/develop/using.html#check-if-a-schema-is-registered-under-subject-kafka-key  https://kafkawize.com/2019/03/17/commands-for-schemaregistry/    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hadoop
			
    
	
		
		
		10-28-2022
	
		
		02:24 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 we have Ambari hadoop cluster based on Ambari platform and installed with HDP version - 2.6.5 , when all machines in the cluster are with RHEL 7.9 version     Ambari cluster of course include the YARN service with two resource manager services     we are facing a problems about ( when master1 and master2 nodes running with the resources manager services )     Connection failed to http://master2.start.com:8088 (timed out)     we tested the below alert with following `wget` approach ,     when alert appears on Ambari then following `wget` test is hang and sometimes its take a time until `wget` finished with results     [root@master2 yarn]# wget http://master2.start.com:8088  --2022-10-28 08:12:49-- http://master2.start.com:8088/  Resolving master2.start.com (master2.start.com)... 172.3.45.68  Connecting to master2.start.com (master2.start.com)|172.3.45.68|:8088... connected.  HTTP request sent, awaiting response... 307 TEMPORARY_REDIRECT  Location: http://master1.start.com:8088/ [following]  --2022-10-28 08:12:50-- http://master1.start.com:8088/  Resolving master1.start.com (master1.start.com)... 172.3.45.61  Connecting to master1.start.com (master01.start.com)|172.3.45.61|:8088... connected.  HTTP request sent, awaiting response... 302 Found  Location: http://master1.start.com:8088/cluster [following]  --2022-10-28 08:12:50-- http://master1.start.com:8088/cluster  Reusing existing connection to master1.start.com:8088.  HTTP request sent, awaiting response... 200 OK  Length: unspecified [text/html]  Saving to: ‘index.html.35’    [ <=> ] 5,419,141 9.46MB/s in 0.5s    2022-10-28 08:12:52 (9.46 MB/s) - ‘index.html.35’ saved [5419141]        port 8088 are licensing from both nodes     ps -ef | grep `lsof -i :8088 | grep -i listen | awk '{print $2}'`  yarn 1977 1 16 Oct27 ? 02:37:32 /usr/jdk64/jdk1.8.0_112/bin/java -Dproc_resourcemanager     also we cheeked with jps     jps -l | grep -i resourcemanager    1977 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager        we also verify the resources manager logs and we see the following    2022-10-27 08:04:30,071 WARN webapp.GenericExceptionHandler (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR  java.lang.NullPointerException  at org.apache.hadoop.yarn.api.records.ContainerId.toString(ContainerId.java:196)  at org.apache.hadoop.yarn.util.ConverterUtils.toString(ConverterUtils.java:165)  at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo.<init>(AppInfo.java:169)  at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:603)  at sun.reflect.GeneratedMethodAccessor155.invoke(Unknown Source)  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)  at java.lang.reflect.Method.invoke(Method.java:498)  at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)      2022-10-27 08:04:32,056 WARN webapp.GenericExceptionHandler (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR  java.lang.NullPointerException  at org.apache.hadoop.yarn.api.records.ContainerId.toString(ContainerId.java:196)  at org.apache.hadoop.yarn.util.ConverterUtils.toString(ConverterUtils.java:165)  at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo.<init>(AppInfo.java:169)  at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:603)  at sun.reflect.GeneratedMethodAccessor155.invoke(Unknown Source)  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)  at java.lang.reflect.Method.invoke(Method.java:498)  at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)  at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResource    2022-10-27 08:05:43,170 ERROR recovery.RMStateStore (RMStateStore.java:notifyStoreOperationFailedInternal(992)) - State store operation failed  org.apache.hadoop.yarn.server.resourcemanager.recovery.StoreFencedException: RMStateStore has been fenced  at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1213)  at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1001)  at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1009)  at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:1042)  at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:639)      2022-10-27 08:05:49,584 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(659)) - ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted      2022-10-28 08:43:31,259 ERROR metrics.SystemMetricsPublisher (SystemMetricsPublisher.java:putEntity(549)) - Error when publishing entity [YARN_APPLICATION,application_1664925617878_1896]  com.sun.jersey.api.client.ClientHandlerException: java.io.IOException: Stream closed.  at com.sun.jersey.api.client.ClientResponse.bufferEntity(ClientResponse.java:583)  at org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPostingObject(TimelineWriter.java:157)  at org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:115)  at org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:112)  at java.security.AccessController.doPrivileged(Native Method)  at javax.security.auth.Subject.doAs(Subject.java:422)  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)  at org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPosting(TimelineWriter.java:112)  at org.apache.hadoop.yarn.client.api.impl.TimelineWriter.putEntities(TimelineWriter.java:92)  at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:348)  at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.putEntity(SystemMetricsPublisher.java:536)  at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.publishApplicationACLsUpdatedEvent(SystemMetricsPublisher.java:392)  at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.handleSystemMetricsEvent(SystemMetricsPublisher.java:257)  at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher$ForwardingEventHandler.handle(SystemMetricsPublisher.java:564)  at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher$ForwardingEventHandler.handle(SystemMetricsPublisher.java:559)  at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)  at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)  at java.lang.Thread.run(Thread.java:745)  Caused by: java.io.IOException: Stream closed.  at java.net.AbstractPlainSocketImpl.available(AbstractPlainSocketImpl.java:470)  at java.net.SocketInputStream.available(SocketInputStream.java:258)  at java.io.BufferedInputStream.read(BufferedInputStream.java:353)  at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:552)  at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:609)    still not clearly from the above logs why resources manager are with the alerts - `Connection failed to http://master2.start.com:8088 (timed out)` 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hadoop
 
         
					
				













