- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Python Streaming
- Labels:
-
Apache Hadoop
-
MapReduce
-
Quickstart VM
Created on ‎07-31-2013 09:07 PM - edited ‎09-16-2022 01:46 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm trying to use my local installation of Cloudera Quickstart VM to do a small mapreduce job in Python.
My test script works when I explicitly add python to the script:
# cat inputfile.txt | python mymapper.py | sort | python myreducer.py
I need to add python to the path in the vm. What's the best way to do this so it finds python from the command line and in Hadoop? I haven't been successful trying to find and modify the right files in the Cloudera VM.
(I was able to run this on AWS. I tried from the hadoop command line also:
hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.3.0.jar \
-input inputfile.txt \
-output output010 \
-mapper mymapper.py \
-file mymapper.py \
-combiner myreducer.py \
-reducer myreducer.py \
-file myreducer.py
... and it fails)
Any help to get the right would be appreciated.
thanks,
jp
Created ‎08-06-2013 05:36 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It took me a while to figure out. I just got it a minute ago.
I was running scripts that I developed in Windows (where end-of-line = cr+lf). I needed to strip out the "cr" so the python interpreter in Linux wouldn't be looking for /usr/bin/env python/r, but /usr/bin/env python.
Now I can move on.
jp
Created ‎08-01-2013 10:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jp,
Try inserting the header "#!/usr/bin/env python" as the first line in your scripts. This signals to the operating system that your scripts are executable through Python. If you do this in your local example (and do "chmod +x *.py"), it works without having to add python to the script:
cat inputfile.txt | ./mymapper.py | sort | ./myreducer.py
Copy the modified files back into HDFS and MapReduce will now be able to execute your mappers and reducers.
Created ‎08-01-2013 01:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. I rebooted, reconstructed new files and again tried both #!/usr/bin/env python and #!/usr/bin/python and changed permissions to include -x .
I'm making it through the file, mymapper, and sort, but I'm getting "no such file or directory" when I pipe it to ./myreducer.py
But when I explicitly add "python" as the executable it works.
I'm guessing this is some obvious newbie issue (new to linux) but I should have this in the bag by now.
jp
Created ‎08-01-2013 02:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you see something similar to "bash: ./myreducer.py: No such file or directory" your typo is in the path or filename of the reducer script.
But if you see "bad interpreter" in the error, it means the path you're using to point to python is incorrect.
If you have a hard time finding a typo, try copy / pasting the output of "ls -l", your exact command and the exact output of that command, and possibly your scripts as well. In the Linux terminal windows, Ctrl + Shift + C and Ctrl + Shift + V can be used to copy and paste.
Created ‎08-01-2013 09:08 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
JP,
One other thought, which may be off track, but since I can't see the command-line data that Sean has mentioned, I'm just guessing, is that you might want to check the permissions on the reducer.py script. In order for it to accept the pipe and execute the sorted data as input, it must be executable. You can assure it is executable by issuing a "chmod 755 reducer.py" on the file.
HTH,
Clint
Created ‎08-01-2013 09:30 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks - I did this through the properties screen of the file browser, but I tried it again with the command you supplied. still no luck - the process completes, but outputs nothing, even with a plain vanilla reducer (echoing the mapper output).
jp
Created ‎08-01-2013 09:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Odd. I take it you're doing something in your reducer that's smart about reading the "standard input" that's being piped to it? Something like:
for line in sys.stdin:
Also, as Sean indicated, if we could get pastes of your source code and also the actual command-line output/errors you are seeing, that would round out the picture for us.
Thanks,
Clint
Created ‎08-06-2013 05:36 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It took me a while to figure out. I just got it a minute ago.
I was running scripts that I developed in Windows (where end-of-line = cr+lf). I needed to strip out the "cr" so the python interpreter in Linux wouldn't be looking for /usr/bin/env python/r, but /usr/bin/env python.
Now I can move on.
jp
Created ‎08-06-2013 07:09 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ouch, the old Windows-Linux end-of-line character conversion problem strikes again! Thanks for closing the loop with us, jp, glad it's resolved!
Created ‎08-08-2013 11:40 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Nice!
For information: editing the script directly in the File Browser in Hue does this cleaning too!
