Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

not able to save countByValue() RDD to textFile - pyspark

Expert Contributor

Hi Team,

I am working on to get count out number of occurence in a file.

RDD1 = RDD.flatMap(lambda x:x.split('|'))

RDD2 = RDD1.countByValue()

I want to save the output of RDD2 to textfile.

i am able to see output by

for x,y in RDD2.items():print(x,y)

but when tried to save to textfile using RDD2.saveAsTextFile(\path) it is not working.

it was throwing as 'AttributeError: 'collections.defaultdict' object has no attribute 'saveAsTextFile''

Can you please help me understanding if i am missing something here. Or how to save countByValue to text file

1 ACCEPTED SOLUTION

@Viswa

countByValue() converts result in a Map collection not a RDD.

saveAsTextFile() is defined to work on a RDD, not on a map/collection.

Even though you have named the variable as RDD2 as shown below, it does not result in a 'RDD'

RDD2 = RDD1.countByValue()

Here are the definitions:

def countByValue()(implicit ord: Ordering[T] = null): Map[T, Long]

Return the count of each unique value in this RDD as a local map of (value, count) pairs.
def saveAsTextFile(path: String): Unit

Save this RDD as a text file, using string representations of elements.

View solution in original post

2 REPLIES 2

@Viswa

countByValue() converts result in a Map collection not a RDD.

saveAsTextFile() is defined to work on a RDD, not on a map/collection.

Even though you have named the variable as RDD2 as shown below, it does not result in a 'RDD'

RDD2 = RDD1.countByValue()

Here are the definitions:

def countByValue()(implicit ord: Ordering[T] = null): Map[T, Long]

Return the count of each unique value in this RDD as a local map of (value, count) pairs.
def saveAsTextFile(path: String): Unit

Save this RDD as a text file, using string representations of elements.

Expert Contributor

@Dinesh Chitlangia

Thank you for explanation. In that case i would rather use reducByKey() to get the number of occurence.

thanks for the info on CountByValue()

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.