Support Questions
Find answers, ask questions, and share your expertise

App time line server is going down regularly

App time line server is going down regularly

Contributor

Hi every one,

I am hiving the issue with my app timeline server when i try to restart it it is coming up but after some time it is going down

and i found that in resource manager ui Apps pending is keep on increasing i am attaching screenshoti am attaching the log file can you please help me how to solve this

85500-rm-ui.jpg

2018-08-03 10:26:38,430 INFO service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore failed in state INITED; cause: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /hadoop/yarn/timeline/leveldb-timeline-store/starttime-ldb/001547.sst: Too many open files org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /hadoop/yarn/timeline/leveldb-timeline-store/starttime-ldb/001547.sst: Too many open files at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) at org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.serviceInit(RollingLevelDBTimelineStore.java:324) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.serviceInit(EntityGroupFSTimelineStore.java:151) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:168) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:178) 2018-08-03 10:26:38,504 INFO service.AbstractService (AbstractService.java:noteFailure(272)) - Service EntityGroupFSTimelineStore failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /hadoop/yarn/timeline/leveldb-timeline-store/starttime-ldb/001547.sst: Too many open files org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /hadoop/yarn/timeline/leveldb-timeline-store/starttime-ldb/001547.sst: Too many open files at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.serviceInit(EntityGroupFSTimelineStore.java:151) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:168) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:178) Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /hadoop/yarn/timeline/leveldb-timeline-store/starttime-ldb/001547.sst: Too many open files at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) at org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.serviceInit(RollingLevelDBTimelineStore.java:324) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) ... 7 more 2018-08-03 10:26:38,508 INFO timeline.EntityGroupFSTimelineStore (EntityGroupFSTimelineStore.java:serviceStop(297)) - Stopping EntityGroupFSTimelineStore 2018-08-03 10:26:38,509 INFO service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /hadoop/yarn/timeline/leveldb-timeline-store/starttime-ldb/001547.sst: Too many open files org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /hadoop/yarn/timeline/leveldb-timeline-store/starttime-ldb/001547.sst: Too many open files at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.serviceInit(EntityGroupFSTimelineStore.java:151) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:168) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:178) Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /hadoop/yarn/timeline/leveldb-timeline-store/starttime-ldb/001547.sst: Too many open files at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) at org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.serviceInit(RollingLevelDBTimelineStore.java:324) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) ... 7 more 2018-08-03 10:26:38,510 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(211)) - Stopping ApplicationHistoryServer metrics system... 2018-08-03 10:26:38,511 INFO impl.MetricsSinkAdapter (MetricsSinkAdapter.java:publishMetricsFromQueue(141)) - timeline thread interrupted. 2018-08-03 10:26:38,513 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(217)) - ApplicationHistoryServer metrics system stopped. 2018-08-03 10:26:38,513 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(606)) - ApplicationHistoryServer metrics system shutdown complete. 2018-08-03 10:26:38,513 FATAL applicationhistoryservice.ApplicationHistoryServer (ApplicationHistoryServer.java:launchAppHistoryServer(171)) - Error starting ApplicationHistoryServer org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /hadoop/yarn/timeline/leveldb-timeline-store/starttime-ldb/001547.sst: Too many open files at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.serviceInit(EntityGroupFSTimelineStore.java:151) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:168) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:178) Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /hadoop/yarn/timeline/leveldb-timeline-store/starttime-ldb/001547.sst: Too many open files at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) at org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.serviceInit(RollingLevelDBTimelineStore.java:324) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) ... 7 more 2018-08-03 10:26:38,515 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status -1 2018-08-03 10:26:38,515 INFO timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:run(416)) - Closing HadoopTimelineMetricSink. Flushing metrics to collector... 2018-08-03 10:26:38,561 INFO applicationhistoryservice.ApplicationHistoryServer (LogAdapter.java:info(45)) - SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down ApplicationHistoryServer at ip

Thanks In advance

1 REPLY 1

Re: App time line server is going down regularly

@kanna k For every rolling level db instance (Hourly) there will be 1000 maximum open files (yarn.timeline-service.leveldb-timeline-store.max-open-files), with each file 2MB max_file_size. it keeps every LevelDB open until it gets expired


The below tunables will help in resolving this issue

yarn.timeline-service.ttl-ms=604800000 

Please add this on "Custom yarn-site" on "Advanced" in YARN configs:

yarn.timeline-service.rolling-period=daily 
yarn.timeline-service.leveldb-timeline-store.read-cache-size=4194304 
yarn.timeline-service.leveldb-timeline-store.write-buffer-size=4194304 
yarn.timeline-service.leveldb-timeline-store.max-open-files=500