Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Getting Error while executing this command

avatar
rdd = sc.parallelize(r1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  c = list(c)  # Make it a list so we can compute its length
TypeError: 'PipelinedRDD' object is not iterable

~~~~~~~~~~~~~~~~~My commands are ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

>>> R = sc.textFile(filename);
>>> R.collect()
>>> r1 = R.map(lambda s: s.split(","))
>>> r1.collect()
>>> rdd = sc.parallelize(r1)
1 ACCEPTED SOLUTION

avatar
Explorer

R is an RDD. So r1 is also an RDD.

So you are trying to call "parallelize()" on an RDD, where you should not do that. Usually, use parallelize() on a local python object, like a list.

View solution in original post

2 REPLIES 2

avatar
Explorer

R is an RDD. So r1 is also an RDD.

So you are trying to call "parallelize()" on an RDD, where you should not do that. Usually, use parallelize() on a local python object, like a list.

avatar

Additionally, if you want to change number of partitions (and then parallelism) of an existing RDD, you can use

rdd.repartition(8)

See the comments and tests from here: https://community.hortonworks.com/questions/5825/best-way-to-select-distinct-values-from-multiple-c....