Support Questions
Find answers, ask questions, and share your expertise

Getting Error while executing this command

Solved Go to solution
Highlighted

Getting Error while executing this command

New Contributor
rdd = sc.parallelize(r1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  c = list(c)  # Make it a list so we can compute its length
TypeError: 'PipelinedRDD' object is not iterable

~~~~~~~~~~~~~~~~~My commands are ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

>>> R = sc.textFile(filename);
>>> R.collect()
>>> r1 = R.map(lambda s: s.split(","))
>>> r1.collect()
>>> rdd = sc.parallelize(r1)
1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Getting Error while executing this command

Explorer

R is an RDD. So r1 is also an RDD.

So you are trying to call "parallelize()" on an RDD, where you should not do that. Usually, use parallelize() on a local python object, like a list.

View solution in original post

2 REPLIES 2
Highlighted

Re: Getting Error while executing this command

Explorer

R is an RDD. So r1 is also an RDD.

So you are trying to call "parallelize()" on an RDD, where you should not do that. Usually, use parallelize() on a local python object, like a list.

View solution in original post

Re: Getting Error while executing this command

Additionally, if you want to change number of partitions (and then parallelism) of an existing RDD, you can use

rdd.repartition(8)

See the comments and tests from here: https://community.hortonworks.com/questions/5825/best-way-to-select-distinct-values-from-multiple-c....