<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: spark  history server  event log  question in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/spark-history-server-event-log-question/m-p/408663#M252759</link>
    <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/65114"&gt;@mokkan&lt;/a&gt;,&amp;nbsp;&lt;/P&gt;&lt;P&gt;Yes, that buffer means that it will wait until it reaches the 100kb and will write it to the location under HDFS.&amp;nbsp;&lt;BR /&gt;If you need to get the events before, you can decrease the buffer size.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Not all the times the buffer is reached it will perform the fsync,&amp;nbsp;&lt;BR /&gt;According to what I found, Spark uses OutputStream.flush(), so not all the times an&amp;nbsp;fsync() will be performed.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards,&amp;nbsp;&lt;BR /&gt;Andrés Fallas&lt;BR /&gt;Cloudera Customer Operations Engineer&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 23 May 2025 23:38:29 GMT</pubDate>
    <dc:creator>vafs</dc:creator>
    <dc:date>2025-05-23T23:38:29Z</dc:date>
    <item>
      <title>spark  history server  event log  question</title>
      <link>https://community.cloudera.com/t5/Support-Questions/spark-history-server-event-log-question/m-p/408572#M252746</link>
      <description>&lt;P&gt;Spark job is writing&amp;nbsp; event log to&amp;nbsp;&amp;nbsp;&lt;SPAN&gt;hdfs://namenode:8021/spark-history , but our job creates many events within&amp;nbsp; 10&amp;nbsp; to 12 minutes.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;As per the document&amp;nbsp;spark.eventLog.buffer.kb&amp;nbsp; =100kb&lt;/P&gt;&lt;P&gt;Does it mean when event writes to /spark-history/application_xxxxxx_xx/xx&amp;nbsp; ,&amp;nbsp; whenever buffer gets 100kb?&amp;nbsp; It means every time it is going to call fsync?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;spark.eventLog.buffer.kb&lt;/TD&gt;&lt;TD&gt;100k&lt;/TD&gt;&lt;TD&gt;Buffer size to use when writing to output streams, in KiB unless otherwise specified.&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;</description>
      <pubDate>Wed, 21 May 2025 18:18:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/spark-history-server-event-log-question/m-p/408572#M252746</guid>
      <dc:creator>mokkan</dc:creator>
      <dc:date>2025-05-21T18:18:07Z</dc:date>
    </item>
    <item>
      <title>Re: spark  history server  event log  question</title>
      <link>https://community.cloudera.com/t5/Support-Questions/spark-history-server-event-log-question/m-p/408663#M252759</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/65114"&gt;@mokkan&lt;/a&gt;,&amp;nbsp;&lt;/P&gt;&lt;P&gt;Yes, that buffer means that it will wait until it reaches the 100kb and will write it to the location under HDFS.&amp;nbsp;&lt;BR /&gt;If you need to get the events before, you can decrease the buffer size.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Not all the times the buffer is reached it will perform the fsync,&amp;nbsp;&lt;BR /&gt;According to what I found, Spark uses OutputStream.flush(), so not all the times an&amp;nbsp;fsync() will be performed.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards,&amp;nbsp;&lt;BR /&gt;Andrés Fallas&lt;BR /&gt;Cloudera Customer Operations Engineer&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 23 May 2025 23:38:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/spark-history-server-event-log-question/m-p/408663#M252759</guid>
      <dc:creator>vafs</dc:creator>
      <dc:date>2025-05-23T23:38:29Z</dc:date>
    </item>
  </channel>
</rss>

