<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: CDH 5.6 in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/CDH-5-6-amp-Spark/m-p/44364#M22303</link>
    <description>&lt;P&gt;Hortonworks HDP 2.4 includes it (v.1.6.0).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Anyway, SparkR was merged into Spark project since 1.4 (see &lt;A href="https://amplab-extras.github.io/SparkR-pkg/" target="_self"&gt;old AmpLab project page&lt;/A&gt;), so I don't understand why Cloudera can't just ship it along with the rest of the Spark package.&amp;nbsp; It seems a conscious decision to &lt;EM&gt;remove&lt;/EM&gt; the module - what's the reason?&lt;/P&gt;</description>
    <pubDate>Thu, 25 Aug 2016 16:50:19 GMT</pubDate>
    <dc:creator>MilesYao</dc:creator>
    <dc:date>2016-08-25T16:50:19Z</dc:date>
    <item>
      <title>CDH 5.6 &amp; Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/CDH-5-6-amp-Spark/m-p/38480#M22300</link>
      <description>&lt;P&gt;The &lt;A href="http://www.cloudera.com/documentation/enterprise/latest/topics/spark.html" target="_self"&gt;Spark guide&lt;/A&gt; mentions that CDH Spark lacks some features such as Spark SQL for Pyspark and the new spark.ml API. Where can i find more information on the changes that Cloudera made to Apache Spark for CDH 5.6? What is the base version of Spark being used (CDH 5.5.2 uses Spark 1.5.0 afaik).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thanks,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;Peter&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 10:07:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/CDH-5-6-amp-Spark/m-p/38480#M22300</guid>
      <dc:creator>jbowles</dc:creator>
      <dc:date>2022-09-16T10:07:59Z</dc:date>
    </item>
    <item>
      <title>Re: CDH 5.6</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/CDH-5-6-amp-Spark/m-p/38482#M22301</link>
      <description>That's not what it says; it say they just aren't supported, typically&lt;BR /&gt;because they're not "supported" in Spark either (e.g. experimental&lt;BR /&gt;API). Supported != doesn't work, just means you can't file a support&lt;BR /&gt;ticket for it.&lt;BR /&gt;&lt;BR /&gt;CDH 5.6 = Spark 1.5 + patches, meaning it's like 1.5.2 likely with a&lt;BR /&gt;slightly different set of maintenance patches. It might not have&lt;BR /&gt;unimportant ones that maybe shouldn't be in a maintenance release, or&lt;BR /&gt;might have a critical one that was created after 1.5.2. Generally&lt;BR /&gt;speaking there are no other differences; it's just upstream Spark with&lt;BR /&gt;some tinkering with versions to make it integrate with other Hadoop&lt;BR /&gt;components correctly.&lt;BR /&gt;&lt;BR /&gt;The exception is SparkR, which isn't even shipped, partly because CDH&lt;BR /&gt;can't ship R itself.&lt;BR /&gt;</description>
      <pubDate>Wed, 09 Mar 2016 09:21:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/CDH-5-6-amp-Spark/m-p/38482#M22301</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2016-03-09T09:21:26Z</dc:date>
    </item>
    <item>
      <title>Re: CDH 5.6</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/CDH-5-6-amp-Spark/m-p/38501#M22302</link>
      <description>thanks Sean!</description>
      <pubDate>Wed, 09 Mar 2016 13:53:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/CDH-5-6-amp-Spark/m-p/38501#M22302</guid>
      <dc:creator>jbowles</dc:creator>
      <dc:date>2016-03-09T13:53:44Z</dc:date>
    </item>
    <item>
      <title>Re: CDH 5.6</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/CDH-5-6-amp-Spark/m-p/44364#M22303</link>
      <description>&lt;P&gt;Hortonworks HDP 2.4 includes it (v.1.6.0).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Anyway, SparkR was merged into Spark project since 1.4 (see &lt;A href="https://amplab-extras.github.io/SparkR-pkg/" target="_self"&gt;old AmpLab project page&lt;/A&gt;), so I don't understand why Cloudera can't just ship it along with the rest of the Spark package.&amp;nbsp; It seems a conscious decision to &lt;EM&gt;remove&lt;/EM&gt; the module - what's the reason?&lt;/P&gt;</description>
      <pubDate>Thu, 25 Aug 2016 16:50:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/CDH-5-6-amp-Spark/m-p/44364#M22303</guid>
      <dc:creator>MilesYao</dc:creator>
      <dc:date>2016-08-25T16:50:19Z</dc:date>
    </item>
    <item>
      <title>Re: CDH 5.6</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/CDH-5-6-amp-Spark/m-p/44365#M22304</link>
      <description>&lt;P&gt;See my reply above. You'd be surprised how many people&amp;nbsp;&lt;EM&gt;complain&lt;/EM&gt; about shipping things that aren't supported. It's about as many that complain about not shipping things that aren't supported.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Specific to R: Shipping or otherwise arranging to install R is a small barrier because it is GPL and can't ship with CDH. This ultimately isn't a big barrier.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Supportability is also a moderate issue. It's not&amp;nbsp;trivial to&amp;nbsp;get the whole support machine able to actually provide support for a new environment and technology, and R is not just another big data tool. Again that's more a question of effort.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Maturity is a moderate&amp;nbsp;issue. The API continued to&amp;nbsp;change over Spark 1.x. For a while you could dapply code across the cluster, then it was removed, then it was added back. It's more an argument that this sort of thing is hard to&amp;nbsp;&lt;EM&gt;support&lt;/EM&gt; rather than&amp;nbsp;&lt;EM&gt;ship&amp;nbsp;&lt;/EM&gt;but these things are linked.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Lastly it's really demand. People&amp;nbsp;do seem interested in "parallelizing R code" but it's not what SparkR does. They also use 3rd party tools like H2O + R or Revo. It hasn't been something people actually want to pay for support on.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 25 Aug 2016 17:00:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/CDH-5-6-amp-Spark/m-p/44365#M22304</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2016-08-25T17:00:04Z</dc:date>
    </item>
    <item>
      <title>Re: CDH 5.6</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/CDH-5-6-amp-Spark/m-p/44419#M22305</link>
      <description>&lt;P&gt;Thanks for your detailed reply.&amp;nbsp; That's a valid and understandable concern.&amp;nbsp; We chose Cloudera for our production Hadoop platform precisely for the quality of integration and maturity you offer.&amp;nbsp; We as users simply need some clarity from the vendor for observed feature discrepancies from the official distro, especially for such a critical component as Spark.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Are there any other discrepancy/customization that we should be aware of?&amp;nbsp; Can Cloudera be more transparent in your release notes whenever you remove/modify features from the official open-source versions?&amp;nbsp; Searching for "SparkR" in &lt;A href="https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_package_tarball_57.html" target="_self"&gt;CDH5.7 release notes&lt;/A&gt; for Spark found 4 Jiras, which would give one the impression that SparkR &lt;EM&gt;is&lt;/EM&gt; included.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks again,&lt;/P&gt;&lt;P&gt;Miles&lt;/P&gt;</description>
      <pubDate>Fri, 26 Aug 2016 16:16:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/CDH-5-6-amp-Spark/m-p/44419#M22305</guid>
      <dc:creator>MilesYao</dc:creator>
      <dc:date>2016-08-26T16:16:14Z</dc:date>
    </item>
    <item>
      <title>Re: CDH 5.6</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/CDH-5-6-amp-Spark/m-p/44421#M22306</link>
      <description>&lt;P&gt;It has always been documented in "Known Issues":&amp;nbsp;&lt;A href="https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_spark_ki.html" target="_blank"&gt;https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_spark_ki.html&lt;/A&gt; &amp;nbsp;Generally speaking, there aren't differences. Not supported != different. However there are some pieces that aren't shipped like the thrift server and SparkR.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Usually differences crop up when upstream introduces a breaking change and it can't be followed in a minor release. For example:&amp;nbsp;default in CDH is for the "legacy" memory config parameters to be active so that default memory config doesn't change in 1.6. Sometimes it relates to other stuff in the platfrom that can't change, like I think the Akka version is (was) different because other stuff in Hadoop needed a different version.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The biggest example of this IMHO is Spark Streaming + Kafka. Spark 1.x doesn't support Kafka 0.9+ but CDH 5.7+ had to move to it to get security features. So CDH Spark 1.6 will actually only work with 0.9+ because the Kafka differences are mutually incompatible. Good in that you can use recent Kafka, but, a difference!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Most of it though are warnings about incompatibilities between what Spark happens to support and what CDH ships in other components.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 26 Aug 2016 16:18:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/CDH-5-6-amp-Spark/m-p/44421#M22306</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2016-08-26T16:18:12Z</dc:date>
    </item>
  </channel>
</rss>

