<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Log aggregation for Long running Spark Streaming jobs in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Log-aggregation-for-Long-running-Spark-Streaming-jobs/m-p/51282#M55081</link>
    <description>&lt;P&gt;The documentation for YARN log aggregation says that logs are aggregated after an application completes.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Does this rule out the applicability of YARN log aggregation for Spark streaming jobs because in theory streaming jobs run for a much longer duration and potentially don't ever terminate. I want to get the Spark Streaming jobs into HDFS before the job completes; Since Streaming jobs runs forever. Is there a good way to get Spark log data into HDFS?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Suri&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 11:08:01 GMT</pubDate>
    <dc:creator>SuriNuthalapati</dc:creator>
    <dc:date>2022-09-16T11:08:01Z</dc:date>
    <item>
      <title>Log aggregation for Long running Spark Streaming jobs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Log-aggregation-for-Long-running-Spark-Streaming-jobs/m-p/51282#M55081</link>
      <description>&lt;P&gt;The documentation for YARN log aggregation says that logs are aggregated after an application completes.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Does this rule out the applicability of YARN log aggregation for Spark streaming jobs because in theory streaming jobs run for a much longer duration and potentially don't ever terminate. I want to get the Spark Streaming jobs into HDFS before the job completes; Since Streaming jobs runs forever. Is there a good way to get Spark log data into HDFS?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Suri&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 11:08:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Log-aggregation-for-Long-running-Spark-Streaming-jobs/m-p/51282#M55081</guid>
      <dc:creator>SuriNuthalapati</dc:creator>
      <dc:date>2022-09-16T11:08:01Z</dc:date>
    </item>
    <item>
      <title>Re: Log aggregation for Long running Spark Streaming jobs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Log-aggregation-for-Long-running-Spark-Streaming-jobs/m-p/51289#M55082</link>
      <description>&lt;P&gt;You achieve it by setting appropriate value: in yarn-site.xml&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds&lt;/P&gt;&lt;P&gt;Then yarn will aggreagate the logs for the running jobs too.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml" target="_blank"&gt;https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Suri&lt;/P&gt;</description>
      <pubDate>Tue, 21 Feb 2017 22:25:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Log-aggregation-for-Long-running-Spark-Streaming-jobs/m-p/51289#M55082</guid>
      <dc:creator>SuriNuthalapati</dc:creator>
      <dc:date>2017-02-21T22:25:25Z</dc:date>
    </item>
    <item>
      <title>Re: Log aggregation for Long running Spark Streaming jobs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Log-aggregation-for-Long-running-Spark-Streaming-jobs/m-p/86108#M55083</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We are running Spark Streaming job on cluster managed by CM 6. After the spark streaming job has been run for like 4-5 days, the &lt;STRONG&gt;Spark UI&lt;/STRONG&gt; for that particular job does not open. It says, logs like this in my nohup driver output file.&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;servlet.ServletHandler: Error for /streaming/&lt;/STRONG&gt;&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;&lt;STRONG&gt;java.lang.OutOfMemoryError: Java heap space&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;These logs are logged many times in a continuous series.&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But my job keeps on running fine. Its just that I am not able to open up the UI by clicking the Application Master link when I open the job from YARN Running Applications UI.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 08 Feb 2019 03:59:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Log-aggregation-for-Long-running-Spark-Streaming-jobs/m-p/86108#M55083</guid>
      <dc:creator>Indeep</dc:creator>
      <dc:date>2019-02-08T03:59:30Z</dc:date>
    </item>
  </channel>
</rss>

