<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to do logging in Spark Applications without using actions in logger statements? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-logging-in-Spark-Applications-without-using/m-p/146569#M28234</link>
    <description>&lt;P&gt;as long as logging is on, a lot will show in the history and in logs.&lt;/P&gt;&lt;P&gt;For Spark Job setup:&lt;/P&gt;&lt;P&gt;  sparkConf.set("spark.eventLog.enabled","true")&lt;/P&gt;&lt;P&gt;Then check the Spark History Server&lt;/P&gt;&lt;P&gt;You can also put on old fashioned Java logging&lt;/P&gt;&lt;PRE&gt;import org.apache.log4j.{Level, Logger}    

val logger: Logger = Logger.getLogger("My.Example.Code.Rules")
Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
Logger.getLogger("org.apache.spark.storage.BlockManager").setLevel(Level.ERROR)
logger.setLevel(Level.INFO)

&lt;/PRE&gt;&lt;P&gt;You can set it to info, but expect a lot of junk.   &lt;/P&gt;</description>
    <pubDate>Sat, 14 May 2016 01:45:49 GMT</pubDate>
    <dc:creator>TimothySpann</dc:creator>
    <dc:date>2016-05-14T01:45:49Z</dc:date>
    <item>
      <title>How to do logging in Spark Applications without using actions in logger statements?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-logging-in-Spark-Applications-without-using/m-p/146567#M28232</link>
      <description>&lt;P&gt;I am trying to capture the logs for my application before and after the Spark Transformation statement. Being Lazy in evaluation the logs get printed before a transformation is actually evaluated. Is there a way to capture logs without calling any Spark action in log statements, avoiding unnecessary CPU consumption?&lt;/P&gt;</description>
      <pubDate>Fri, 13 May 2016 20:50:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-logging-in-Spark-Applications-without-using/m-p/146567#M28232</guid>
      <dc:creator>psingh15</dc:creator>
      <dc:date>2016-05-13T20:50:33Z</dc:date>
    </item>
    <item>
      <title>Re: How to do logging in Spark Applications without using actions in logger statements?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-logging-in-Spark-Applications-without-using/m-p/146568#M28233</link>
      <description>&lt;P&gt;
	Hi Puneet:  I'm not 100% certain I understand your question, but let me suggest:
&lt;/P&gt;
&lt;P&gt;
	If you have a DataFrame or RDD (resilient distributed dataset in memory), and you want to see before/after state for a given Transformation, you could run a relatively low-cost action like take() or sample() to print a few elements from your dataframe. These are relatively low cost operations which only return a few elements to the driver. Full documentation for DataFrame.take() is here:
&lt;/P&gt;
&lt;P&gt;
	&lt;A href="http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrame"&gt;http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrame&lt;/A&gt;
&lt;/P&gt;
&lt;P&gt;
	Excerpt here:
&lt;/P&gt;
&lt;PRE&gt;
DataFrame class:
def take(n: Int): Array[Row]
Returns the first n rows in the DataFrame.
Running take requires moving data into the applications driver process, and doing so with a very large 'n' can crash the driver process with OutOfMemoryError.
&lt;/PRE&gt;</description>
      <pubDate>Sat, 14 May 2016 01:34:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-logging-in-Spark-Applications-without-using/m-p/146568#M28233</guid>
      <dc:creator>phargis</dc:creator>
      <dc:date>2016-05-14T01:34:16Z</dc:date>
    </item>
    <item>
      <title>Re: How to do logging in Spark Applications without using actions in logger statements?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-logging-in-Spark-Applications-without-using/m-p/146569#M28234</link>
      <description>&lt;P&gt;as long as logging is on, a lot will show in the history and in logs.&lt;/P&gt;&lt;P&gt;For Spark Job setup:&lt;/P&gt;&lt;P&gt;  sparkConf.set("spark.eventLog.enabled","true")&lt;/P&gt;&lt;P&gt;Then check the Spark History Server&lt;/P&gt;&lt;P&gt;You can also put on old fashioned Java logging&lt;/P&gt;&lt;PRE&gt;import org.apache.log4j.{Level, Logger}    

val logger: Logger = Logger.getLogger("My.Example.Code.Rules")
Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
Logger.getLogger("org.apache.spark.storage.BlockManager").setLevel(Level.ERROR)
logger.setLevel(Level.INFO)

&lt;/PRE&gt;&lt;P&gt;You can set it to info, but expect a lot of junk.   &lt;/P&gt;</description>
      <pubDate>Sat, 14 May 2016 01:45:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-logging-in-Spark-Applications-without-using/m-p/146569#M28234</guid>
      <dc:creator>TimothySpann</dc:creator>
      <dc:date>2016-05-14T01:45:49Z</dc:date>
    </item>
    <item>
      <title>Re: How to do logging in Spark Applications without using actions in logger statements?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-logging-in-Spark-Applications-without-using/m-p/146570#M28235</link>
      <description>&lt;P&gt;Thanks for the input. Yes that is a solution but I don't want to call any action as I mentioned. So, what I am expecting is some solution like SparkContext.getLogger().info("message") which will be Lazy evaluated when the action is called at last.&lt;/P&gt;</description>
      <pubDate>Sat, 14 May 2016 22:15:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-logging-in-Spark-Applications-without-using/m-p/146570#M28235</guid>
      <dc:creator>psingh15</dc:creator>
      <dc:date>2016-05-14T22:15:52Z</dc:date>
    </item>
    <item>
      <title>Re: How to do logging in Spark Applications without using actions in logger statements?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-logging-in-Spark-Applications-without-using/m-p/146571#M28236</link>
      <description>&lt;P&gt;Yes, relying on Spark logs is a solution to this but it does take away the freedom to log custom messages. So, what I am expecting is some solution like SparkContext.getLogger().info("message") which will be Lazy evaluated when the action is called at last.&lt;/P&gt;</description>
      <pubDate>Sat, 14 May 2016 22:17:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-logging-in-Spark-Applications-without-using/m-p/146571#M28236</guid>
      <dc:creator>psingh15</dc:creator>
      <dc:date>2016-05-14T22:17:17Z</dc:date>
    </item>
    <item>
      <title>Re: How to do logging in Spark Applications without using actions in logger statements?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-logging-in-Spark-Applications-without-using/m-p/146572#M28237</link>
      <description>&lt;P&gt;Hi Puneet - Were you able to solve this problem? I have a similar requirement but not sure how to enable lazy eval for logging purpose. And I am trying to stay away from inducing actions like .first or .take as my files are huge.&lt;/P&gt;&lt;P&gt;I found this link - &lt;A href="http://stackoverflow.com/questions/29208844/apache-spark-logging-within-scala" target="_blank"&gt;http://stackoverflow.com/questions/29208844/apache-spark-logging-within-scala&lt;/A&gt;. But it seems to be not working with my code.&lt;/P&gt;</description>
      <pubDate>Wed, 14 Dec 2016 10:39:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-logging-in-Spark-Applications-without-using/m-p/146572#M28237</guid>
      <dc:creator>patnaik_sanat</dc:creator>
      <dc:date>2016-12-14T10:39:12Z</dc:date>
    </item>
    <item>
      <title>Re: How to do logging in Spark Applications without using actions in logger statements?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-logging-in-Spark-Applications-without-using/m-p/146573#M28238</link>
      <description>&lt;P&gt;The spark logging code is Spark's Logger class, which does lazy eval of expressions like&lt;/P&gt;&lt;PRE&gt;logInfo(s"status $value") &lt;/PRE&gt;&lt;P&gt;Sadly, that's private to the spark code, so outside it you can't use it. See [SPARK-13928](https://issues.apache.org/jira/browse/SPARK-13928) for the discussion, and know that I don't really agree with the decision.&lt;/P&gt;&lt;P&gt;When I was moving some code from org.apache.spark to a different package, I ended up having to copy &amp;amp; paste the entire spark logging class into my own code. Not ideal, but it works: &lt;A href="https://github.com/steveloughran/spark-cloud-examples/blob/master/cloud-examples/src/main/scala/com/hortonworks/spark/cloud/CloudLogging.scala"&gt;CloudLogging.scala&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Bear in mind that underneath, Spark uses SLF4J and whatever back it, such as log4j; you can use SLF4J direct for its lazy eval of log.info("status {}", value). However, the spark lazy string evaluation is easier to use, and I believe is even lazy about evaluating functions inside the strings (.e.g. s"users = ${users.count()}"), so can be more efficient.&lt;/P&gt;&lt;P&gt;The CloudLogging class I've linked to shows how Spark binds to SLF4J; feel free to grab and use it, &lt;/P&gt;</description>
      <pubDate>Wed, 14 Dec 2016 22:48:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-logging-in-Spark-Applications-without-using/m-p/146573#M28238</guid>
      <dc:creator>stevel</dc:creator>
      <dc:date>2016-12-14T22:48:33Z</dc:date>
    </item>
  </channel>
</rss>

