Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Cannot run multiple lines PySpark

Solved Go to solution
Highlighted

Cannot run multiple lines PySpark

New Contributor

Hi,

I am trying to get through the HANDS-ON TOUR OF APACHE SPARK IN 5 MINUTES tutorial with the python interpreter, but when I try to run multiple columns like this part:

%pyspark

myLines=sc.textFile('hdfs://sandbox.hortonworks.com/tmp/Hortonworks')

myLinesFiltered=myLines.filter(lambdax:len(x)>0)

count=myLinesFiltered.count()

print count

I got syntax error at the end of the second line. If I run it line by line it works fine, but if I try to run two lines, I always get syntax error, no matter what I run.

Thanks,

Zsoka

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Cannot run multiple lines PySpark

There is an open Apache bug for this:

PySpark Doesn't Support Multi-Line Statements

https://issues.apache.org/jira/browse/ZEPPELIN-84

View solution in original post

2 REPLIES 2
Highlighted

Re: Cannot run multiple lines PySpark

There is an open Apache bug for this:

PySpark Doesn't Support Multi-Line Statements

https://issues.apache.org/jira/browse/ZEPPELIN-84

View solution in original post

Highlighted

Re: Cannot run multiple lines PySpark

Guru

https://issues.apache.org/jira/browse/ZEPPELIN-84 is regarding breaking a statement in multiple lines.

@Zsoka Kovacs, you should be able to run below paragraph. Do not give extra \n in between and make sure there are no extra chars copied at the end of the line.

{code}

%pyspark

myLines=sc.textFile('/tmp/Hortonworks')

myLinesFiltered=myLines.filter(lambdax:len(x)>0)

count=myLinesFiltered.count()

print count

{code}

Don't have an account?
Coming from Hortonworks? Activate your account here