Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

not able to save countByValue() RDD to textFile - pyspark

avatar
Super Collaborator

Hi Team,

I am working on to get count out number of occurence in a file.

RDD1 = RDD.flatMap(lambda x:x.split('|'))

RDD2 = RDD1.countByValue()

I want to save the output of RDD2 to textfile.

i am able to see output by

for x,y in RDD2.items():print(x,y)

but when tried to save to textfile using RDD2.saveAsTextFile(\path) it is not working.

it was throwing as 'AttributeError: 'collections.defaultdict' object has no attribute 'saveAsTextFile''

Can you please help me understanding if i am missing something here. Or how to save countByValue to text file

1 ACCEPTED SOLUTION

avatar

@Viswa

countByValue() converts result in a Map collection not a RDD.

saveAsTextFile() is defined to work on a RDD, not on a map/collection.

Even though you have named the variable as RDD2 as shown below, it does not result in a 'RDD'

RDD2 = RDD1.countByValue()

Here are the definitions:

def countByValue()(implicit ord: Ordering[T] = null): Map[T, Long]

Return the count of each unique value in this RDD as a local map of (value, count) pairs.
def saveAsTextFile(path: String): Unit

Save this RDD as a text file, using string representations of elements.

View solution in original post

2 REPLIES 2

avatar

@Viswa

countByValue() converts result in a Map collection not a RDD.

saveAsTextFile() is defined to work on a RDD, not on a map/collection.

Even though you have named the variable as RDD2 as shown below, it does not result in a 'RDD'

RDD2 = RDD1.countByValue()

Here are the definitions:

def countByValue()(implicit ord: Ordering[T] = null): Map[T, Long]

Return the count of each unique value in this RDD as a local map of (value, count) pairs.
def saveAsTextFile(path: String): Unit

Save this RDD as a text file, using string representations of elements.

avatar
Super Collaborator

@Dinesh Chitlangia

Thank you for explanation. In that case i would rather use reducByKey() to get the number of occurence.

thanks for the info on CountByValue()