Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

not able to save countByValue() RDD to textFile - pyspark

Solved Go to solution
Highlighted

not able to save countByValue() RDD to textFile - pyspark

Expert Contributor

Hi Team,

I am working on to get count out number of occurence in a file.

RDD1 = RDD.flatMap(lambda x:x.split('|'))

RDD2 = RDD1.countByValue()

I want to save the output of RDD2 to textfile.

i am able to see output by

for x,y in RDD2.items():print(x,y)

but when tried to save to textfile using RDD2.saveAsTextFile(\path) it is not working.

it was throwing as 'AttributeError: 'collections.defaultdict' object has no attribute 'saveAsTextFile''

Can you please help me understanding if i am missing something here. Or how to save countByValue to text file

1 ACCEPTED SOLUTION

Accepted Solutions

Re: not able to save countByValue() RDD to textFile - pyspark

@Viswa

countByValue() converts result in a Map collection not a RDD.

saveAsTextFile() is defined to work on a RDD, not on a map/collection.

Even though you have named the variable as RDD2 as shown below, it does not result in a 'RDD'

RDD2 = RDD1.countByValue()

Here are the definitions:

def countByValue()(implicit ord: Ordering[T] = null): Map[T, Long]

Return the count of each unique value in this RDD as a local map of (value, count) pairs.
def saveAsTextFile(path: String): Unit

Save this RDD as a text file, using string representations of elements.
2 REPLIES 2

Re: not able to save countByValue() RDD to textFile - pyspark

@Viswa

countByValue() converts result in a Map collection not a RDD.

saveAsTextFile() is defined to work on a RDD, not on a map/collection.

Even though you have named the variable as RDD2 as shown below, it does not result in a 'RDD'

RDD2 = RDD1.countByValue()

Here are the definitions:

def countByValue()(implicit ord: Ordering[T] = null): Map[T, Long]

Return the count of each unique value in this RDD as a local map of (value, count) pairs.
def saveAsTextFile(path: String): Unit

Save this RDD as a text file, using string representations of elements.

Re: not able to save countByValue() RDD to textFile - pyspark

Expert Contributor

@Dinesh Chitlangia

Thank you for explanation. In that case i would rather use reducByKey() to get the number of occurence.

thanks for the info on CountByValue()

Don't have an account?
Coming from Hortonworks? Activate your account here