<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: PySpark on Zeppelin in sandbox is not loading data in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/PySpark-on-Zeppelin-in-sandbox-is-not-loading-data/m-p/96856#M10418</link>
    <description>&lt;P&gt;Yup I have done similar with pyspark in Zeppelin as well so should work&lt;/P&gt;</description>
    <pubDate>Thu, 12 Nov 2015 02:17:44 GMT</pubDate>
    <dc:creator>abajwa</dc:creator>
    <dc:date>2015-11-12T02:17:44Z</dc:date>
    <item>
      <title>PySpark on Zeppelin in sandbox is not loading data</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/PySpark-on-Zeppelin-in-sandbox-is-not-loading-data/m-p/96853#M10415</link>
      <description>&lt;P&gt;If I execute this code from zeppelin:&lt;/P&gt;&lt;PRE&gt;%pyspark base_rdd = sc.textFile("/tmp/philadelphia-crime-data-2015-ytd.csv")
base_rdd.take(10)&lt;/PRE&gt;&lt;P&gt;I am not getting any results back, if U execute from pyspark CLI same code i get valid data. Note: I am running pyspark in local mode in CLI, not in  Yarn mode.&lt;/P&gt;&lt;P&gt;Zeppelin is not returning any errors, and no errors in log files.  I am using HDP 2.3.2 sandbox.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Same code works using scala works in zeppelin and in cli&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Looks like a bug in zeppelin with pyspark?&lt;/P&gt;</description>
      <pubDate>Wed, 11 Nov 2015 11:46:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/PySpark-on-Zeppelin-in-sandbox-is-not-loading-data/m-p/96853#M10415</guid>
      <dc:creator>azeltov</dc:creator>
      <dc:date>2015-11-11T11:46:37Z</dc:date>
    </item>
    <item>
      <title>Re: PySpark on Zeppelin in sandbox is not loading data</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/PySpark-on-Zeppelin-in-sandbox-is-not-loading-data/m-p/96854#M10416</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/132/abajwa.html" nodeid="132"&gt;@Ali Bajwa&lt;/A&gt; &lt;A rel="user" href="https://community.cloudera.com/users/140/nsabharwal.html" nodeid="140"&gt;@Neeraj&lt;/A&gt; have u encountered this issue with Pyspark and zeppelin?&lt;/P&gt;</description>
      <pubDate>Wed, 11 Nov 2015 21:42:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/PySpark-on-Zeppelin-in-sandbox-is-not-loading-data/m-p/96854#M10416</guid>
      <dc:creator>azeltov</dc:creator>
      <dc:date>2015-11-11T21:42:11Z</dc:date>
    </item>
    <item>
      <title>Re: PySpark on Zeppelin in sandbox is not loading data</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/PySpark-on-Zeppelin-in-sandbox-is-not-loading-data/m-p/96855#M10417</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/325/azeltov.html" nodeid="325"&gt;@azeltov@hortonworks.com&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I think issues are:&lt;/P&gt;&lt;P&gt;1- you have to use file:// for local files&lt;/P&gt;&lt;P&gt;2- using pyspark, you have to use print before&lt;/P&gt;&lt;P&gt;see example below (working for me):&lt;/P&gt;&lt;PRE&gt;%pyspark 
base_rdd = sc.textFile("file:///usr/hdp/current/spark-client/data/mllib/sample_libsvm_data.txt")
print base_rdd.count()
print base_rdd.take(3)&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 12 Nov 2015 02:08:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/PySpark-on-Zeppelin-in-sandbox-is-not-loading-data/m-p/96855#M10417</guid>
      <dc:creator>gbraccialli3</dc:creator>
      <dc:date>2015-11-12T02:08:22Z</dc:date>
    </item>
    <item>
      <title>Re: PySpark on Zeppelin in sandbox is not loading data</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/PySpark-on-Zeppelin-in-sandbox-is-not-loading-data/m-p/96856#M10418</link>
      <description>&lt;P&gt;Yup I have done similar with pyspark in Zeppelin as well so should work&lt;/P&gt;</description>
      <pubDate>Thu, 12 Nov 2015 02:17:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/PySpark-on-Zeppelin-in-sandbox-is-not-loading-data/m-p/96856#M10418</guid>
      <dc:creator>abajwa</dc:creator>
      <dc:date>2015-11-12T02:17:44Z</dc:date>
    </item>
    <item>
      <title>Re: PySpark on Zeppelin in sandbox is not loading data</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/PySpark-on-Zeppelin-in-sandbox-is-not-loading-data/m-p/96857#M10419</link>
      <description>&lt;P&gt;I had to give it a fully qualified hdfs:// URI in pyspark for me to work&lt;/P&gt;</description>
      <pubDate>Thu, 12 Nov 2015 02:36:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/PySpark-on-Zeppelin-in-sandbox-is-not-loading-data/m-p/96857#M10419</guid>
      <dc:creator>azeltov</dc:creator>
      <dc:date>2015-11-12T02:36:59Z</dc:date>
    </item>
  </channel>
</rss>

