<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: not able to save countByValue() RDD to textFile - pyspark in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/not-able-to-save-countByValue-RDD-to-textFile-pyspark/m-p/220318#M69636</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/13511/dineshchitlangia.html" nodeid="13511"&gt;@Dinesh Chitlangia&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Thank you for explanation. In that case i would rather use reducByKey() to get the number of occurence.&lt;/P&gt;&lt;P&gt;thanks for the info on CountByValue()&lt;/P&gt;</description>
    <pubDate>Sat, 14 Oct 2017 02:22:50 GMT</pubDate>
    <dc:creator>sreeviswa_athic</dc:creator>
    <dc:date>2017-10-14T02:22:50Z</dc:date>
    <item>
      <title>not able to save countByValue() RDD to textFile - pyspark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/not-able-to-save-countByValue-RDD-to-textFile-pyspark/m-p/220316#M69634</link>
      <description>&lt;P&gt;Hi Team,&lt;/P&gt;&lt;P&gt;I am working on to get count out number of occurence in a file. &lt;/P&gt;&lt;P&gt;RDD1 = RDD.flatMap(lambda x:x.split('|'))&lt;/P&gt;&lt;P&gt;RDD2 = RDD1.countByValue()&lt;/P&gt;&lt;P&gt;I want to save the output of RDD2 to textfile. &lt;/P&gt;&lt;P&gt;i am able to see output by&lt;/P&gt;&lt;P&gt;for x,y in RDD2.items():print(x,y)&lt;/P&gt;&lt;P&gt;but when tried to save to textfile using RDD2.saveAsTextFile(\path) it is not working.&lt;/P&gt;&lt;P&gt;it was throwing as 'AttributeError: 'collections.defaultdict' object has no attribute 'saveAsTextFile''&lt;/P&gt;&lt;P&gt;Can you please help me understanding if i am missing something here. Or how to save countByValue to text file&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 14 Oct 2017 01:02:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/not-able-to-save-countByValue-RDD-to-textFile-pyspark/m-p/220316#M69634</guid>
      <dc:creator>sreeviswa_athic</dc:creator>
      <dc:date>2017-10-14T01:02:22Z</dc:date>
    </item>
    <item>
      <title>Re: not able to save countByValue() RDD to textFile - pyspark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/not-able-to-save-countByValue-RDD-to-textFile-pyspark/m-p/220317#M69635</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3057/sreeviswaathikala.html" nodeid="3057"&gt;@Viswa&lt;/A&gt; &lt;/P&gt;&lt;P&gt;countByValue() converts result in a Map collection not a RDD.&lt;/P&gt;&lt;P&gt;saveAsTextFile() is defined to work on a RDD, not on a map/collection.&lt;/P&gt;&lt;P&gt;Even though you have named the variable as RDD2 as shown below, it does not result in a 'RDD'&lt;/P&gt;&lt;PRE&gt;RDD2 = RDD1.countByValue()&lt;/PRE&gt;&lt;P&gt;&lt;A href="https://spark.apache.org/docs/1.6.2/api/scala/index.html#org.apache.spark.rdd.RDD"&gt;Here are the definitions&lt;/A&gt;:&lt;/P&gt;&lt;PRE&gt;def countByValue()(implicit ord: Ordering[T] = null): Map[T, Long]

Return the count of each unique value in this RDD as a local map of (value, count) pairs.&lt;/PRE&gt;
&lt;PRE&gt;def saveAsTextFile(path: String): Unit

Save this RDD as a text file, using string representations of elements.&lt;/PRE&gt;</description>
      <pubDate>Sat, 14 Oct 2017 01:47:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/not-able-to-save-countByValue-RDD-to-textFile-pyspark/m-p/220317#M69635</guid>
      <dc:creator>dineshc</dc:creator>
      <dc:date>2017-10-14T01:47:11Z</dc:date>
    </item>
    <item>
      <title>Re: not able to save countByValue() RDD to textFile - pyspark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/not-able-to-save-countByValue-RDD-to-textFile-pyspark/m-p/220318#M69636</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/13511/dineshchitlangia.html" nodeid="13511"&gt;@Dinesh Chitlangia&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Thank you for explanation. In that case i would rather use reducByKey() to get the number of occurence.&lt;/P&gt;&lt;P&gt;thanks for the info on CountByValue()&lt;/P&gt;</description>
      <pubDate>Sat, 14 Oct 2017 02:22:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/not-able-to-save-countByValue-RDD-to-textFile-pyspark/m-p/220318#M69636</guid>
      <dc:creator>sreeviswa_athic</dc:creator>
      <dc:date>2017-10-14T02:22:50Z</dc:date>
    </item>
  </channel>
</rss>

