- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
not able to save countByValue() RDD to textFile - pyspark
- Labels:
-
Apache Spark
Created ‎10-13-2017 06:02 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Team,
I am working on to get count out number of occurence in a file.
RDD1 = RDD.flatMap(lambda x:x.split('|'))
RDD2 = RDD1.countByValue()
I want to save the output of RDD2 to textfile.
i am able to see output by
for x,y in RDD2.items():print(x,y)
but when tried to save to textfile using RDD2.saveAsTextFile(\path) it is not working.
it was throwing as 'AttributeError: 'collections.defaultdict' object has no attribute 'saveAsTextFile''
Can you please help me understanding if i am missing something here. Or how to save countByValue to text file
Created ‎10-13-2017 06:47 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
countByValue() converts result in a Map collection not a RDD.
saveAsTextFile() is defined to work on a RDD, not on a map/collection.
Even though you have named the variable as RDD2 as shown below, it does not result in a 'RDD'
RDD2 = RDD1.countByValue()
def countByValue()(implicit ord: Ordering[T] = null): Map[T, Long] Return the count of each unique value in this RDD as a local map of (value, count) pairs.
def saveAsTextFile(path: String): Unit Save this RDD as a text file, using string representations of elements.
Created ‎10-13-2017 06:47 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
countByValue() converts result in a Map collection not a RDD.
saveAsTextFile() is defined to work on a RDD, not on a map/collection.
Even though you have named the variable as RDD2 as shown below, it does not result in a 'RDD'
RDD2 = RDD1.countByValue()
def countByValue()(implicit ord: Ordering[T] = null): Map[T, Long] Return the count of each unique value in this RDD as a local map of (value, count) pairs.
def saveAsTextFile(path: String): Unit Save this RDD as a text file, using string representations of elements.
Created ‎10-13-2017 07:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for explanation. In that case i would rather use reducByKey() to get the number of occurence.
thanks for the info on CountByValue()
