<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Using Kryo Serializer with Spark in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-Kryo-Serializer-with-Spark/m-p/143410#M56449</link>
    <description>&lt;P&gt;I have been trying to change the data serializer for Spark jobs running in my HortonWorks Sandbox (v2.5) from the default Java Serializer to the Kryo Serializer, as suggested in multiple places (&lt;EM&gt;e.g.&lt;/EM&gt; &lt;A href="http://spark.apache.org/docs/latest/tuning.html#data-serialization"&gt;Here&lt;/A&gt;, and more specifically &lt;A href="https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_spark-guide/content/ch_tuning-spark.html"&gt;Here&lt;/A&gt;).  I tried editing the /usr/hdp/current/spark-client/conf/spark-env.sh and the /usr/hdp/current/spark-historyserver/conf/spark-env.sh files by including the following: &lt;/P&gt;&lt;PRE&gt;SPARK_JAVA_OPTS+='
 -Dspark.serializer=org.apache.spark.serializer.KryoSerializer
 -Dspark.kryo.registrator=org.apache.spark.graphx.GraphKryoRegistrator '
export SPARK_JAVA_OPTS&lt;/PRE&gt;&lt;P&gt;as recommended &lt;A href="https://databricks-training.s3.amazonaws.com/graph-analytics-with-graphx.html"&gt;Here&lt;/A&gt; (near the bottom of the page).  However, when I restart Spark using Ambari, these files get overwritten and revert back to their original form (&lt;EM&gt;i.e.&lt;/EM&gt;, without the above JAVA_OPTS lines).  I looked at other questions and posts about this topic, and all of them just recommend using Kryo Serialization without saying how to do it, especially within a HortonWorks Sandbox.&lt;/P&gt;&lt;P&gt;I have been using Zeppelin Notebooks to play around with Spark and build some training pages.  Performance is not yet noticeably diminished, but I would like to follow best practices, and this seems to be one of them that I can't crack.  I have also looked around the Spark Configs page, and it is not clear how to include this as a configuration.&lt;/P&gt;&lt;P&gt;How do I make Kryo the serializer of choice for my Spark instance in HDP 2.5 SandBox (residing inside of a VIrtualBox VM on my Windows 10 laptop, if it matters :)).  I think that I see how to set it when spinning up a Spark Shell (or PySpark Shell) using the appropriate configurations on the Spark Context, but I don't want to have to do that every time I start using Spark, or Zeppelin with the Spark Interpreter.&lt;/P&gt;</description>
    <pubDate>Wed, 08 Mar 2017 00:21:07 GMT</pubDate>
    <dc:creator>willett_evan</dc:creator>
    <dc:date>2017-03-08T00:21:07Z</dc:date>
    <item>
      <title>Using Kryo Serializer with Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-Kryo-Serializer-with-Spark/m-p/143410#M56449</link>
      <description>&lt;P&gt;I have been trying to change the data serializer for Spark jobs running in my HortonWorks Sandbox (v2.5) from the default Java Serializer to the Kryo Serializer, as suggested in multiple places (&lt;EM&gt;e.g.&lt;/EM&gt; &lt;A href="http://spark.apache.org/docs/latest/tuning.html#data-serialization"&gt;Here&lt;/A&gt;, and more specifically &lt;A href="https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_spark-guide/content/ch_tuning-spark.html"&gt;Here&lt;/A&gt;).  I tried editing the /usr/hdp/current/spark-client/conf/spark-env.sh and the /usr/hdp/current/spark-historyserver/conf/spark-env.sh files by including the following: &lt;/P&gt;&lt;PRE&gt;SPARK_JAVA_OPTS+='
 -Dspark.serializer=org.apache.spark.serializer.KryoSerializer
 -Dspark.kryo.registrator=org.apache.spark.graphx.GraphKryoRegistrator '
export SPARK_JAVA_OPTS&lt;/PRE&gt;&lt;P&gt;as recommended &lt;A href="https://databricks-training.s3.amazonaws.com/graph-analytics-with-graphx.html"&gt;Here&lt;/A&gt; (near the bottom of the page).  However, when I restart Spark using Ambari, these files get overwritten and revert back to their original form (&lt;EM&gt;i.e.&lt;/EM&gt;, without the above JAVA_OPTS lines).  I looked at other questions and posts about this topic, and all of them just recommend using Kryo Serialization without saying how to do it, especially within a HortonWorks Sandbox.&lt;/P&gt;&lt;P&gt;I have been using Zeppelin Notebooks to play around with Spark and build some training pages.  Performance is not yet noticeably diminished, but I would like to follow best practices, and this seems to be one of them that I can't crack.  I have also looked around the Spark Configs page, and it is not clear how to include this as a configuration.&lt;/P&gt;&lt;P&gt;How do I make Kryo the serializer of choice for my Spark instance in HDP 2.5 SandBox (residing inside of a VIrtualBox VM on my Windows 10 laptop, if it matters :)).  I think that I see how to set it when spinning up a Spark Shell (or PySpark Shell) using the appropriate configurations on the Spark Context, but I don't want to have to do that every time I start using Spark, or Zeppelin with the Spark Interpreter.&lt;/P&gt;</description>
      <pubDate>Wed, 08 Mar 2017 00:21:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-Kryo-Serializer-with-Spark/m-p/143410#M56449</guid>
      <dc:creator>willett_evan</dc:creator>
      <dc:date>2017-03-08T00:21:07Z</dc:date>
    </item>
    <item>
      <title>Re: Using Kryo Serializer with Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-Kryo-Serializer-with-Spark/m-p/143411#M56450</link>
      <description>&lt;P&gt;hi &lt;A rel="user" href="https://community.cloudera.com/users/16479/willett-evan.html" nodeid="16479"&gt;@Evan Willett&lt;/A&gt; &lt;/P&gt;&lt;P&gt;The official Spark Documentation says this:&lt;/P&gt;&lt;PRE&gt;The only reason Kryo is not the default is because of the custom
registration requirement, but we recommend trying it in any 
network-intensive application.


Since Spark 2.0.0, we internally use Kryo serializer when shuffling RDDs
with simple types, arrays of simple types, or string type&lt;/PRE&gt;&lt;P&gt;Link:&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.cloudera.com/"&gt;http://spark.apache.org/docs/latest/tuning.html#data-serialization&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 10 Mar 2017 01:26:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-Kryo-Serializer-with-Spark/m-p/143411#M56450</guid>
      <dc:creator>adnanalvee</dc:creator>
      <dc:date>2017-03-10T01:26:23Z</dc:date>
    </item>
    <item>
      <title>Re: Using Kryo Serializer with Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-Kryo-Serializer-with-Spark/m-p/143412#M56451</link>
      <description>&lt;P&gt;According to the situation, it seems that you are asking how to set the parameter with Ambari.&lt;/P&gt;&lt;BLOCKQUOTE&gt;However, when I restart Spark using Ambari, these files get overwritten and revert back to their original form (&lt;EM&gt;i.e.&lt;/EM&gt;, without the above JAVA_OPTS lines).
&lt;/BLOCKQUOTE&gt;&lt;P&gt;You should put the parameter via Ambari.&lt;/P&gt;&lt;P&gt;1. Visit your Ambari (e.g., &lt;A href="http://hdp26-1:8080/)" target="_blank"&gt;http://hdp26-1:8080/)&lt;/A&gt;&lt;/P&gt;&lt;P&gt;2. Click Spark2 in the left pane.&lt;/P&gt;&lt;P&gt;3. Client `Configs` in Spark2 page.&lt;/P&gt;&lt;P&gt;4. In "Advanced spark2-env", find "content". Then, you can see the same content of `spark-env.sh` managed by Ambari. &lt;/P&gt;&lt;P&gt;Could you try the above?&lt;/P&gt;</description>
      <pubDate>Fri, 10 Mar 2017 02:21:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-Kryo-Serializer-with-Spark/m-p/143412#M56451</guid>
      <dc:creator>dhyun</dc:creator>
      <dc:date>2017-03-10T02:21:29Z</dc:date>
    </item>
    <item>
      <title>Re: Using Kryo Serializer with Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-Kryo-Serializer-with-Spark/m-p/143413#M56452</link>
      <description>&lt;P&gt;Thanks so much!  That worked.  The same solution works for Spark 1.6 operating within HDP 2.5, which is what I was using.  &lt;/P&gt;</description>
      <pubDate>Fri, 10 Mar 2017 02:49:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-Kryo-Serializer-with-Spark/m-p/143413#M56452</guid>
      <dc:creator>willett_evan</dc:creator>
      <dc:date>2017-03-10T02:49:54Z</dc:date>
    </item>
    <item>
      <title>Re: Using Kryo Serializer with Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-Kryo-Serializer-with-Spark/m-p/143414#M56453</link>
      <description>&lt;P&gt;Great! &lt;A rel="user" href="https://community.cloudera.com/users/16479/willett-evan.html" nodeid="16479"&gt;@Evan Willett&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 10 Mar 2017 02:51:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-Kryo-Serializer-with-Spark/m-p/143414#M56453</guid>
      <dc:creator>dhyun</dc:creator>
      <dc:date>2017-03-10T02:51:27Z</dc:date>
    </item>
    <item>
      <title>Re: Using Kryo Serializer with Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-Kryo-Serializer-with-Spark/m-p/143415#M56454</link>
      <description>&lt;P&gt;Hi &lt;A href="https://community.hortonworks.com/users/16479/willett-evan.html"&gt;@Evan Willett&lt;/A&gt; could you plz share steps for what are you did? &lt;/P&gt;</description>
      <pubDate>Wed, 11 Oct 2017 22:13:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-Kryo-Serializer-with-Spark/m-p/143415#M56454</guid>
      <dc:creator>mahmoud_kamel10</dc:creator>
      <dc:date>2017-10-11T22:13:49Z</dc:date>
    </item>
  </channel>
</rss>

