<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Yarn timeline server periodically fails in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Yarn-timeline-server-periodically-fails/m-p/98707#M12084</link>
    <description>&lt;P&gt;Further details on this.&lt;/P&gt;&lt;H3&gt;Configuring the Spark History Server to Use HDFS&lt;/H3&gt;&lt;P&gt;&lt;A href="https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_installing_manually_book/content/config-shs-hdfs.html"&gt;https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_installing_manually_book/content/config-shs-hdfs.html&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 25 Mar 2016 03:16:44 GMT</pubDate>
    <dc:creator>andrew_sears</dc:creator>
    <dc:date>2016-03-25T03:16:44Z</dc:date>
    <item>
      <title>Yarn timeline server periodically fails</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Yarn-timeline-server-periodically-fails/m-p/98703#M12080</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt; I'm using HDP 2.3.0 and Yarn app timeline server is failing periodically. Checking app timeline server log, the cause is due to GC overhead limit exceeded.&lt;/P&gt;&lt;PRE&gt;2015-12-02 12:48:56,548 ERROR mortbay.log (Slf4jLog.java:warn(87)) - /ws/v1/timeline/spark_event_v01
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at org.codehaus.jackson.util.TextBuffer.contentsAsString(TextBuffer.java:350)
        at org.codehaus.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:278)
        at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:59)
        at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.mapObject(UntypedObjectDeserializer.java:204)
        at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:47)
        at org.codehaus.jackson.map.ObjectReader._bindAndClose(ObjectReader.java:768)
        at org.codehaus.jackson.map.ObjectReader.readValue(ObjectReader.java:486)
        at org.apache.hadoop.yarn.server.timeline.GenericObjectMapper.read(GenericObjectMapper.java:93)
        at org.apache.hadoop.yarn.server.timeline.GenericObjectMapper.read(GenericObjectMapper.java:77)
        at org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.getEntityEvent(LeveldbTimelineStore.java:1188)
        at org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.getEntity(LeveldbTimelineStore.java:437)
        at org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.getEntityByTime(LeveldbTimelineStore.java:685)
        at org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.getEntities(LeveldbTimelineStore.java:557)
        at org.apache.hadoop.yarn.server.timeline.TimelineDataManager.getEntities(TimelineDataManager.java:134)
        at org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntities(TimelineWebServices.java:119)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
        at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
        at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
        at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
        at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
        at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
        at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
        at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
        at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
        at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
        at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)&lt;/PRE&gt;&lt;P&gt;It seems that the timeline server fails to delete old Leveldb data so every time it must load a large volume of old entries which cause GC overhead. Checking the log there is a lot of lines like the following:&lt;/P&gt;&lt;PRE&gt;2015-12-02 12:48:14,471 WARN  timeline.LeveldbTimelineStore (LeveldbTimelineStore.java:deleteNextEntity(1459)) - Found no start time for reverse related entity tez_appattempt_1447379225800_23982_000001 of type TEZ_APPLICATION_ATTEMPT while deleting dag_1447379225800_23982_1 of type TEZ_DAG_ID
2015-12-02 12:48:14,471 WARN  timeline.LeveldbTimelineStore (LeveldbTimelineStore.java:deleteNextEntity(1459)) - Found no start time for reverse related entity tez_appattempt_1447379225800_23982_000001 of type TEZ_APPLICATION_ATTEMPT while deleting dag_1447379225800_23982_1 of type TEZ_DAG_ID
&lt;/PRE&gt;&lt;P&gt;And checking the volume of the timeline data folder gives the following info:&lt;/P&gt;&lt;PRE&gt;40K     timeline/timeline-state-store.ldb
7.0G    timeline/leveldb-timeline-store.ldb
7.0G    timeline
3.4G    timeline-data/leveldb-timeline-store.ldb
3.4G    timeline-data
&lt;/PRE&gt;&lt;P&gt;With app timeline server failing, currently I cannot see the history of my Spark Jobs. Any help is appreciated.&lt;/P&gt;</description>
      <pubDate>Fri, 11 Dec 2015 10:11:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Yarn-timeline-server-periodically-fails/m-p/98703#M12080</guid>
      <dc:creator>linh_mtran168</dc:creator>
      <dc:date>2015-12-11T10:11:02Z</dc:date>
    </item>
    <item>
      <title>Re: Yarn timeline server periodically fails</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Yarn-timeline-server-periodically-fails/m-p/98704#M12081</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/1095/linhmtran168.html" nodeid="1095"&gt;@Linh Tran&lt;/A&gt;&lt;P&gt;Please check memory utilization while running the operations.&lt;/P&gt;&lt;P&gt;java.lang.OutOfMemoryError: GC overhead limit exceeded&lt;/P&gt;&lt;P&gt;&lt;A href="http://stackoverflow.com/questions/5839359/java-lang-outofmemoryerror-gc-overhead-limit-exceeded" target="_blank"&gt;http://stackoverflow.com/questions/5839359/java-lang-outofmemoryerror-gc-overhead-limit-exceeded&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 12 Dec 2015 10:53:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Yarn-timeline-server-periodically-fails/m-p/98704#M12081</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2015-12-12T10:53:10Z</dc:date>
    </item>
    <item>
      <title>Re: Yarn timeline server periodically fails</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Yarn-timeline-server-periodically-fails/m-p/98705#M12082</link>
      <description>&lt;P&gt;This looks like it's being triggered by the Spark -&amp;gt; timeline server integration, as ATS is going OOM when handling spark events.&lt;/P&gt;&lt;P&gt;Which means its my code running in the spark jobs triggering this.&lt;/P&gt;&lt;P&gt;What kind of jobs are you running? Short lived? Long-lived? Many executors? &lt;/P&gt;&lt;P&gt;The best short-term fix is for you to disable the timeline server integration, and set the spark applications up to log to HDFS instead, with the history server reading it from there.&lt;/P&gt;&lt;P&gt;The details of this are covered in &lt;A href="http://spark.apache.org/docs/latest/monitoring.html"&gt;Spark Monitoring&lt;/A&gt;&lt;/P&gt;&lt;P&gt;1. In the spark job configuration you need to disable the ATS publishing.&lt;/P&gt;&lt;P&gt;Find the line&lt;/P&gt;&lt;PRE&gt;spark.yarn.services org.apache.spark.deploy.yarn.history.YarnHistoryService&lt;/PRE&gt;
&lt;P&gt;-delete it&lt;/P&gt;&lt;P&gt;set the property spark.history.fs.logDirectory to an HDFS directory which must be writeable by everyone. For example, hdfs://shared/logfiles &lt;/P&gt;&lt;P&gt;spark.eventLog.enabled&lt;/P&gt;&lt;P&gt;true
&lt;/P&gt;&lt;P&gt;spark.eventLog.compress&lt;/P&gt;&lt;P&gt;truespark.history.fs.logDirectoryhdfs://shared/logfiles
&lt;/P&gt;&lt;P&gt;2. In the history server you need to switch to the filesystem log provider&lt;/P&gt;&lt;P&gt;spark.history.providerorg.apache.spark.deploy.history.FsHistoryProviderspark.history.fs.logDirectoryhdfs://shared/logfiles
&lt;/P&gt;&lt;P&gt;The next spark release we'll have up for download (soon!) will log less events to the timeline server. Hopefully it will reduce the problems on the timeline server. There's also lots of work going on in the timeline server for future hadoop versions to handle larger amounts of data —by mixing stuff kept in HDFS with the leveldb data.&lt;/P&gt;&lt;P&gt;for now, switching to the filesystem provider is your best bet&lt;/P&gt;</description>
      <pubDate>Sat, 12 Dec 2015 22:01:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Yarn-timeline-server-periodically-fails/m-p/98705#M12082</guid>
      <dc:creator>stevel</dc:creator>
      <dc:date>2015-12-12T22:01:42Z</dc:date>
    </item>
    <item>
      <title>Re: Yarn timeline server periodically fails</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Yarn-timeline-server-periodically-fails/m-p/98706#M12083</link>
      <description>&lt;P&gt;Can I add that there's now a preview of Spark 1.6 on HDP: this one shouldn't overload the timeline server&lt;/P&gt;</description>
      <pubDate>Sat, 09 Jan 2016 00:33:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Yarn-timeline-server-periodically-fails/m-p/98706#M12083</guid>
      <dc:creator>stevel</dc:creator>
      <dc:date>2016-01-09T00:33:52Z</dc:date>
    </item>
    <item>
      <title>Re: Yarn timeline server periodically fails</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Yarn-timeline-server-periodically-fails/m-p/98707#M12084</link>
      <description>&lt;P&gt;Further details on this.&lt;/P&gt;&lt;H3&gt;Configuring the Spark History Server to Use HDFS&lt;/H3&gt;&lt;P&gt;&lt;A href="https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_installing_manually_book/content/config-shs-hdfs.html"&gt;https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_installing_manually_book/content/config-shs-hdfs.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 25 Mar 2016 03:16:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Yarn-timeline-server-periodically-fails/m-p/98707#M12084</guid>
      <dc:creator>andrew_sears</dc:creator>
      <dc:date>2016-03-25T03:16:44Z</dc:date>
    </item>
  </channel>
</rss>

