<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question spark - spark socketexception connection reset by peer in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/spark-spark-socketexception-connection-reset-by-peer/m-p/196930#M62323</link>
    <description>&lt;P&gt;was trying spark scenario, tried to load csv file of data set around 1M records into a RDD.&lt;/P&gt;&lt;P&gt;did a split by delimiter, and was checking count() which worked. &lt;/P&gt;&lt;P&gt;on the same RDD wanted to check sample data and tried action take(10)which did not work. &lt;/P&gt;&lt;P&gt;Was throwing spark socketexception connection reset by peer&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="15988-sockect-exception.jpg" style="width: 1782px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/17875i03AE6AF52A9DDD3C/image-size/medium?v=v2&amp;amp;px=400" role="button" title="15988-sockect-exception.jpg" alt="15988-sockect-exception.jpg" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Your assistance would be of great help&lt;/P&gt;</description>
    <pubDate>Sun, 18 Aug 2019 06:22:00 GMT</pubDate>
    <dc:creator>sreeviswa_athic</dc:creator>
    <dc:date>2019-08-18T06:22:00Z</dc:date>
    <item>
      <title>spark - spark socketexception connection reset by peer</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/spark-spark-socketexception-connection-reset-by-peer/m-p/196930#M62323</link>
      <description>&lt;P&gt;was trying spark scenario, tried to load csv file of data set around 1M records into a RDD.&lt;/P&gt;&lt;P&gt;did a split by delimiter, and was checking count() which worked. &lt;/P&gt;&lt;P&gt;on the same RDD wanted to check sample data and tried action take(10)which did not work. &lt;/P&gt;&lt;P&gt;Was throwing spark socketexception connection reset by peer&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="15988-sockect-exception.jpg" style="width: 1782px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/17875i03AE6AF52A9DDD3C/image-size/medium?v=v2&amp;amp;px=400" role="button" title="15988-sockect-exception.jpg" alt="15988-sockect-exception.jpg" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Your assistance would be of great help&lt;/P&gt;</description>
      <pubDate>Sun, 18 Aug 2019 06:22:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/spark-spark-socketexception-connection-reset-by-peer/m-p/196930#M62323</guid>
      <dc:creator>sreeviswa_athic</dc:creator>
      <dc:date>2019-08-18T06:22:00Z</dc:date>
    </item>
    <item>
      <title>Re: spark - spark socketexception connection reset by peer</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/spark-spark-socketexception-connection-reset-by-peer/m-p/196931#M62324</link>
      <description>&lt;P&gt;If you are using PySpark, there appears to be a bug where pyspark crashes for large datasets.&lt;/P&gt;&lt;P&gt;&lt;A href="https://issues.apache.org/jira/browse/SPARK-12261"&gt;https://issues.apache.org/jira/browse/SPARK-12261&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Since you are just trying to see sample data, you could use collect and then print.&lt;/P&gt;&lt;P&gt;However, collect should not be used for large datasets as it brings all the data to driver node and could basically make the driver node run out of memory.&lt;/P&gt;&lt;P&gt;&lt;A href="https://spark.apache.org/docs/1.6.0/programming-guide.html#printing-elements-of-an-rdd"&gt;This link&lt;/A&gt; gives a detail on how to print the rdd elements using Scala. &lt;/P&gt;&lt;P&gt;&lt;A href="http://spark.apache.org/docs/2.1.0/api/python/pyspark.html"&gt;Refer here for PySpark.&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 05 Jun 2017 22:32:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/spark-spark-socketexception-connection-reset-by-peer/m-p/196931#M62324</guid>
      <dc:creator>dineshc</dc:creator>
      <dc:date>2017-06-05T22:32:39Z</dc:date>
    </item>
    <item>
      <title>Re: spark - spark socketexception connection reset by peer</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/spark-spark-socketexception-connection-reset-by-peer/m-p/196932#M62325</link>
      <description>&lt;P&gt;tried creating RDD with collect() and print out using for loop. Was working fine.&lt;/P&gt;&lt;P&gt;Was trying out in pyspark though. &lt;/P&gt;&lt;P&gt;thank you&lt;/P&gt;</description>
      <pubDate>Tue, 06 Jun 2017 01:06:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/spark-spark-socketexception-connection-reset-by-peer/m-p/196932#M62325</guid>
      <dc:creator>sreeviswa_athic</dc:creator>
      <dc:date>2017-06-06T01:06:01Z</dc:date>
    </item>
  </channel>
</rss>

