<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Metrics for a Spark Streaming Operation in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Metrics-for-a-Spark-Streaming-Operation/m-p/18696#M2916</link>
    <description>By latest do you mean the version 1.1.0?&lt;BR /&gt;&lt;BR /&gt;So does the version 1.0.0 that comes with CDH5.1 does not have this feature?</description>
    <pubDate>Fri, 12 Sep 2014 13:10:11 GMT</pubDate>
    <dc:creator>ArunShell</dc:creator>
    <dc:date>2014-09-12T13:10:11Z</dc:date>
    <item>
      <title>Metrics for a Spark Streaming Operation</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Metrics-for-a-Spark-Streaming-Operation/m-p/18686#M2911</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am streaming data in Spark and doing a join operation with a batch file in HDFS.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am joining one window of the stream with HDFS.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to calculate the time taken to do this join (for each window) using the below code, but it did not work. (the output was 0 always).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am using the Spark-Shell &amp;nbsp;for this code.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any suggestions on how to achieve this? Thanks!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;val jobstarttime = System.currentTimeMillis();
val ssc = new StreamingContext(sc, Seconds(60))
val streamrecs = ssc.socketTextStream("10.11.12.13", 5549)
val streamkv = streamrecs.map(_.split("~")).map(r =&amp;gt; ( r(0), (r(5), r(6))))
val streamwindow = streamkv.window(Minutes(2))
val HDFSlines = sc.textFile("/user/batchdata").map(_.split("~")).map(r =&amp;gt; ( r(1), (r(0))))
val outfile = new PrintWriter(new File("//home//user1//metrics1" ))
val joinstarttime = System.currentTimeMillis();
val join1 = streamwindow.transform(joinRDD =&amp;gt; { joinRDD.join(HDFSlines)} )
val joinsendtime = System.currentTimeMillis();
val jointime = (joinsendtime - joinstarttime)/1000
val J = jointime.toString()
val J1 = "\n Time taken for Joining is " + J
outfile.write(J1)
join1.print()
val savestarttime = System.currentTimeMillis();
join1.saveAsTextFiles("/user/joinone5")
val savesendtime = System.currentTimeMillis();
val savetime = (savesendtime - savestarttime)/1000
val S = savetime.toString()
val S1 = "\n Time taken for Saving is " + S
outfile.write(S1)
ssc.start()
outfile.close()
ssc.awaitTermination()&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 09:07:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Metrics-for-a-Spark-Streaming-Operation/m-p/18686#M2911</guid>
      <dc:creator>ArunShell</dc:creator>
      <dc:date>2022-09-16T09:07:31Z</dc:date>
    </item>
    <item>
      <title>Re: Metrics for a Spark Streaming Operation</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Metrics-for-a-Spark-Streaming-Operation/m-p/18688#M2912</link>
      <description>&lt;P&gt;The code here doesn't do work, really. It sets up and configures work. It expresses where data comes from, how it is transformed, and where it goes. No work is done until ssc.start(). So timing the code before doesn't help.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You can already see some timing information in the Spark Streaming UI.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You can try computing timing within the functions, since that will time them at the time of execution. However, even methods like .join() called in the transform() function are themselves transformations, that don't do work immediately. It would not help to time that one. Actions like foreach would make sense to time.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Really I would start by looking at Spark's built-in timing metrics.&lt;/P&gt;</description>
      <pubDate>Fri, 12 Sep 2014 10:30:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Metrics-for-a-Spark-Streaming-Operation/m-p/18688#M2912</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2014-09-12T10:30:56Z</dc:date>
    </item>
    <item>
      <title>Re: Metrics for a Spark Streaming Operation</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Metrics-for-a-Spark-Streaming-Operation/m-p/18690#M2913</link>
      <description>&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;By Spark Streaming UI, do you mean the Spark Master UI?&lt;/P&gt;</description>
      <pubDate>Fri, 12 Sep 2014 11:00:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Metrics-for-a-Spark-Streaming-Operation/m-p/18690#M2913</guid>
      <dc:creator>ArunShell</dc:creator>
      <dc:date>2014-09-12T11:00:30Z</dc:date>
    </item>
    <item>
      <title>Re: Metrics for a Spark Streaming Operation</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Metrics-for-a-Spark-Streaming-Operation/m-p/18692#M2914</link>
      <description>&lt;P&gt;Yes there is a special Streaming tab in the latest Spark driver UI.&lt;/P&gt;</description>
      <pubDate>Fri, 12 Sep 2014 11:42:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Metrics-for-a-Spark-Streaming-Operation/m-p/18692#M2914</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2014-09-12T11:42:14Z</dc:date>
    </item>
    <item>
      <title>Re: Metrics for a Spark Streaming Operation</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Metrics-for-a-Spark-Streaming-Operation/m-p/18694#M2915</link>
      <description>&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Fri, 12 Sep 2014 11:54:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Metrics-for-a-Spark-Streaming-Operation/m-p/18694#M2915</guid>
      <dc:creator>ArunShell</dc:creator>
      <dc:date>2014-09-12T11:54:53Z</dc:date>
    </item>
    <item>
      <title>Re: Metrics for a Spark Streaming Operation</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Metrics-for-a-Spark-Streaming-Operation/m-p/18696#M2916</link>
      <description>By latest do you mean the version 1.1.0?&lt;BR /&gt;&lt;BR /&gt;So does the version 1.0.0 that comes with CDH5.1 does not have this feature?</description>
      <pubDate>Fri, 12 Sep 2014 13:10:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Metrics-for-a-Spark-Streaming-Operation/m-p/18696#M2916</guid>
      <dc:creator>ArunShell</dc:creator>
      <dc:date>2014-09-12T13:10:11Z</dc:date>
    </item>
    <item>
      <title>Re: Metrics for a Spark Streaming Operation</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Metrics-for-a-Spark-Streaming-Operation/m-p/18698#M2917</link>
      <description>&lt;P&gt;I believe it was added in 1.1, yes. I don't have a streaming app driver handy, so maybe double-check -- you will see an obvious Streaming tab if it's there. Without guaranteeing anything, I think the next CDH will have 1.1, and at any time you can run your own Spark jobs with any version under YARN.&lt;/P&gt;</description>
      <pubDate>Fri, 12 Sep 2014 13:23:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Metrics-for-a-Spark-Streaming-Operation/m-p/18698#M2917</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2014-09-12T13:23:58Z</dc:date>
    </item>
  </channel>
</rss>

