- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Cannot run multiple lines PySpark
- Labels:
-
Apache Zeppelin
Created ‎03-24-2017 10:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am trying to get through the HANDS-ON TOUR OF APACHE SPARK IN 5 MINUTES tutorial with the python interpreter, but when I try to run multiple columns like this part:
%pyspark
myLines=sc.textFile('hdfs://sandbox.hortonworks.com/tmp/Hortonworks')
myLinesFiltered=myLines.filter(lambdax:len(x)>0)
count=myLinesFiltered.count()
print count
I got syntax error at the end of the second line. If I run it line by line it works fine, but if I try to run two lines, I always get syntax error, no matter what I run.
Thanks,
Zsoka
Created ‎03-24-2017 04:23 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is an open Apache bug for this:
PySpark Doesn't Support Multi-Line Statements
Created ‎03-24-2017 04:23 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is an open Apache bug for this:
PySpark Doesn't Support Multi-Line Statements
Created ‎03-24-2017 09:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
https://issues.apache.org/jira/browse/ZEPPELIN-84 is regarding breaking a statement in multiple lines.
@Zsoka Kovacs, you should be able to run below paragraph. Do not give extra \n in between and make sure there are no extra chars copied at the end of the line.
{code}
%pyspark
myLines=sc.textFile('/tmp/Hortonworks')
myLinesFiltered=myLines.filter(lambdax:len(x)>0)
count=myLinesFiltered.count()
print count
{code}
