question Python Streaming in Support Questions

Python Streaming

bigpalooka — Fri, 16 Sep 2022 08:46:22 GMT

I'm trying to use my local installation of Cloudera Quickstart VM to do a small mapreduce job in Python.

My test script works when I explicitly add python to the script:

# cat inputfile.txt | python mymapper.py | sort | python myreducer.py

I need to add python to the path in the vm. What's the best way to do this so it finds python from the command line and in Hadoop? I haven't been successful trying to find and modify the right files in the Cloudera VM.

(I was able to run this on AWS. I tried from the hadoop command line also:

hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.3.0.jar \

-input inputfile.txt \

-output output010 \

-mapper mymapper.py \

-file mymapper.py \

-combiner myreducer.py \

-reducer myreducer.py \

-file myreducer.py

... and it fails)

Any help to get the right would be appreciated.

thanks,

Re: Python Streaming

Sean — Thu, 01 Aug 2013 17:07:05 GMT

jp,

Try inserting the header "#!/usr/bin/env python" as the first line in your scripts. This signals to the operating system that your scripts are executable through Python. If you do this in your local example (and do "chmod +x *.py"), it works without having to add python to the script:

cat inputfile.txt | ./mymapper.py | sort | ./myreducer.py

Copy the modified files back into HDFS and MapReduce will now be able to execute your mappers and reducers.

Re: Python Streaming

bigpalooka — Thu, 01 Aug 2013 20:43:20 GMT

Thanks. I rebooted, reconstructed new files and again tried both #!/usr/bin/env python and #!/usr/bin/python and changed permissions to include -x .

I'm making it through the file, mymapper, and sort, but I'm getting "no such file or directory" when I pipe it to ./myreducer.py

But when I explicitly add "python" as the executable it works.

I'm guessing this is some obvious newbie issue (new to linux) but I should have this in the bag by now.

Re: Python Streaming

Sean — Thu, 01 Aug 2013 21:04:00 GMT

It sounds like you may have a typo in one of the file paths.

If you see something similar to "bash: ./myreducer.py: No such file or directory" your typo is in the path or filename of the reducer script.

But if you see "bad interpreter" in the error, it means the path you're using to point to python is incorrect.

If you have a hard time finding a typo, try copy / pasting the output of "ls -l", your exact command and the exact output of that command, and possibly your scripts as well. In the Linux terminal windows, Ctrl + Shift + C and Ctrl + Shift + V can be used to copy and paste.

Re: Python Streaming

Clint — Fri, 02 Aug 2013 04:08:34 GMT

JP,

One other thought, which may be off track, but since I can't see the command-line data that Sean has mentioned, I'm just guessing, is that you might want to check the permissions on the reducer.py script. In order for it to accept the pipe and execute the sorted data as input, it must be executable. You can assure it is executable by issuing a "chmod 755 reducer.py" on the file.

HTH,

Clint

Re: Python Streaming

bigpalooka — Fri, 02 Aug 2013 04:22:34 GMT

I renamed my mapper and reducer to jpm.py and jpr.py to make sure my spelling is right. The reducer part of the "cat" doesn't work unless it's preceeded by "python". Then it completes successfully.

In hadoop map-reduce, from the command line, I've gotten the process to complete, but it yields no results. I reduced the reducer functionality to just pass on what comes from the mapper. It completes, but doesn't yield any results in the output (file size = 0). I removed the reducer completely and I get what I expect from the mapper.

I'd like to progress to the gui's and get a taste of pig and hive in cloudera by the end of the month. I think I'm going to try all over again with a fresh vm.

Re: Python Streaming

bigpalooka — Fri, 02 Aug 2013 04:30:36 GMT

thanks - I did this through the properties screen of the file browser, but I tried it again with the command you supplied. still no luck - the process completes, but outputs nothing, even with a plain vanilla reducer (echoing the mapper output).

Re: Python Streaming

Clint — Fri, 02 Aug 2013 04:52:06 GMT

Odd. I take it you're doing something in your reducer that's smart about reading the "standard input" that's being piped to it? Something like:

for line in sys.stdin:

Also, as Sean indicated, if we could get pastes of your source code and also the actual command-line output/errors you are seeing, that would round out the picture for us.

Thanks,

Clint

Re: Python Streaming

bigpalooka — Wed, 07 Aug 2013 00:36:23 GMT

It took me a while to figure out. I just got it a minute ago.

I was running scripts that I developed in Windows (where end-of-line = cr+lf). I needed to strip out the "cr" so the python interpreter in Linux wouldn't be looking for /usr/bin/env python/r, but /usr/bin/env python.

Now I can move on.

Re: Python Streaming

Clint — Wed, 07 Aug 2013 02:09:40 GMT

Ouch, the old Windows-Linux end-of-line character conversion problem strikes again! Thanks for closing the loop with us, jp, glad it's resolved!

Re: Python Streaming

Romainr — Thu, 08 Aug 2013 18:40:12 GMT

Nice!

For information: editing the script directly in the File Browser in Hue does this cleaning too!

Re: Python Streaming

charmeep — Wed, 14 Aug 2013 20:12:18 GMT

Hi,

I have a similar problem. I wrote a simple mapper and reducer to read input file and calculate total number of lines.

This works great locally

cat access.log | ./linecount_mapper.py | ./linecount_reduce.py

Same input files and scripts, when used in streaming returns this error message. Any suggestions?

java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.ja

Thanks,

Charmee

Re: Python Streaming

charmeep — Wed, 14 Aug 2013 20:36:52 GMT

This is how I invoke the mapreduce job

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.0.0-cdh4.3.0.jar

-input /user/certification/sandbox/access.log -output /user/certification/sandboxout -mapper /user/certification/sandbox/linecount_mapper.py -reducer /user/certification/sandbox/linecount_reduce.py

I also tried using hadoop-streaming.jar, it gives me the same error as well.

Any suggestions are greatly appreciated.

Re: Python Streaming

ahegazi — Tue, 26 Aug 2014 01:48:58 GMT

Hi Folks,

Can anyone help me here as well.

I also get same error as follow:

 /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.3.0-cdh5.1.0.jar -files /tmp/mapper.py,/tmp/reducer.py -mapper /tmp/mapper.py -reducer /tmp/reducer.py -input gutenberg/4300.txt -output output2
packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.3.0-cdh5.1.0.jar] /tmp/streamjob725052303650188667.jar tmpDir=null
14/08/26 02:44:06 INFO client.RMProxy: Connecting to ResourceManager at hdmachine1.example.com/128.243.29.224:8032
14/08/26 02:44:06 INFO client.RMProxy: Connecting to ResourceManager at hdmachine1.example.com/128.243.29.224:8032
14/08/26 02:44:06 WARN security.UserGroupInformation: PriviledgedActionException as:hduser (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://hdmachine1.example.com:8020/user/hduser/output2 already exists
14/08/26 02:44:06 WARN security.UserGroupInformation: PriviledgedActionException as:hduser (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://hdmachine1.example.com:8020/user/hduser/output2 already exists
14/08/26 02:44:06 ERROR streaming.StreamJob: Error Launching job : Output directory hdfs://hdmachine1.example.com:8020/user/hduser/output2 already exists
Streaming Command Failed!
[hduser@hdmachine1 ~]$ /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.3.0-cdh5.1.0.jar -files /tmp/mapper.py,/tmp/reducer.py -mapper /tmp/mapper.py -reducer /tmp/reducer.py -input gutenberg/4300.txt -output op-1
packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.3.0-cdh5.1.0.jar] /tmp/streamjob6895399468399805454.jar tmpDir=null
14/08/26 02:44:21 INFO client.RMProxy: Connecting to ResourceManager at hdmachine1.example.com/128.243.29.224:8032
14/08/26 02:44:21 INFO client.RMProxy: Connecting to ResourceManager at hdmachine1.example.com/128.243.29.224:8032
14/08/26 02:44:22 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
14/08/26 02:44:22 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 8e266e052e423af592871e2dfe09d54c03f6a0e8]
14/08/26 02:44:22 INFO mapred.FileInputFormat: Total input paths to process : 1
14/08/26 02:44:22 INFO mapreduce.JobSubmitter: number of splits:2
14/08/26 02:44:23 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1409004459008_0013
14/08/26 02:44:23 INFO impl.YarnClientImpl: Submitted application application_1409004459008_0013
14/08/26 02:44:23 INFO mapreduce.Job: The url to track the job: http://hdmachine1.example.com:8088/proxy/application_1409004459008_0013/
14/08/26 02:44:23 INFO mapreduce.Job: Running job: job_1409004459008_0013
14/08/26 02:44:27 INFO mapreduce.Job: Job job_1409004459008_0013 running in uber mode : false
14/08/26 02:44:27 INFO mapreduce.Job:  map 0% reduce 0%
14/08/26 02:44:30 INFO mapreduce.Job: Task Id : attempt_1409004459008_0013_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
	... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
	... 14 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
	... 17 more
Caused by: java.lang.RuntimeException: configuration exception
	at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
	at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
	... 22 more
Caused by: java.io.IOException: Cannot run program "/tmp/mapper.py": error=2, No such file or directory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
	at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
	... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.<init>(UNIXProcess.java:186)
	at java.lang.ProcessImpl.start(ProcessImpl.java:130)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
	... 24 more

14/08/26 02:44:31 INFO mapreduce.Job: Task Id : attempt_1409004459008_0013_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
	... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
	... 14 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
	... 17 more
Caused by: java.lang.RuntimeException: configuration exception
	at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
	at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
	... 22 more
Caused by: java.io.IOException: Cannot run program "/tmp/mapper.py": error=2, No such file or directory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
	at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
	... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.<init>(UNIXProcess.java:186)
	at java.lang.ProcessImpl.start(ProcessImpl.java:130)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
	... 24 more

14/08/26 02:44:34 INFO mapreduce.Job: Task Id : attempt_1409004459008_0013_m_000001_1, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
	... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
	... 14 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
	... 17 more
Caused by: java.lang.RuntimeException: configuration exception
	at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
	at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
	... 22 more
Caused by: java.io.IOException: Cannot run program "/tmp/mapper.py": error=2, No such file or directory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
	at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
	... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.<init>(UNIXProcess.java:186)
	at java.lang.ProcessImpl.start(ProcessImpl.java:130)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
	... 24 more

14/08/26 02:44:35 INFO mapreduce.Job: Task Id : attempt_1409004459008_0013_m_000000_1, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
	... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
	... 14 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
	... 17 more
Caused by: java.lang.RuntimeException: configuration exception
	at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
	at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
	... 22 more
Caused by: java.io.IOException: Cannot run program "/tmp/mapper.py": error=2, No such file or directory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
	at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
	... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.<init>(UNIXProcess.java:186)
	at java.lang.ProcessImpl.start(ProcessImpl.java:130)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
	... 24 more

14/08/26 02:44:38 INFO mapreduce.Job: Task Id : attempt_1409004459008_0013_m_000001_2, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
	... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
	... 14 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
	... 17 more
Caused by: java.lang.RuntimeException: configuration exception
	at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
	at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
	... 22 more
Caused by: java.io.IOException: Cannot run program "/tmp/mapper.py": error=2, No such file or directory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
	at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
	... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.<init>(UNIXProcess.java:186)
	at java.lang.ProcessImpl.start(ProcessImpl.java:130)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
	... 24 more

14/08/26 02:44:39 INFO mapreduce.Job: Task Id : attempt_1409004459008_0013_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
	... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
	... 14 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
	... 17 more
Caused by: java.lang.RuntimeException: configuration exception
	at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
	at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
	... 22 more
Caused by: java.io.IOException: Cannot run program "/tmp/mapper.py": error=2, No such file or directory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
	at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
	... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.<init>(UNIXProcess.java:186)
	at java.lang.ProcessImpl.start(ProcessImpl.java:130)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
	... 24 more

14/08/26 02:44:43 INFO mapreduce.Job:  map 100% reduce 100%
14/08/26 02:44:44 INFO mapreduce.Job: Job job_1409004459008_0013 failed with state FAILED due to: Task failed task_1409004459008_0013_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

14/08/26 02:44:44 INFO mapreduce.Job: Counters: 13
	Job Counters 
		Failed map tasks=7
		Killed map tasks=1
		Launched map tasks=8
		Other local map tasks=6
		Rack-local map tasks=2
		Total time spent by all maps in occupied slots (ms)=17357
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=17357
		Total vcore-seconds taken by all map tasks=17357
		Total megabyte-seconds taken by all map tasks=17773568
	Map-Reduce Framework
		CPU time spent (ms)=0
		Physical memory (bytes) snapshot=0
		Virtual memory (bytes) snapshot=0
14/08/26 02:44:44 ERROR streaming.StreamJob: Job not Successful!
Streaming Command Failed!

I tried to make the following from all previous posts I saw

1- replace #!/usr/bin/env python3.2 to #!/usr/bin/python3.2

2- I copied the mapper.py and reducer.py to /tmp and made both scripts with 777 ermissions

3- I restarted all hadoop servies

4- I used both streaming jar files /usr/lib/hadoop-mapreduce/hadoop-streaming-2.3.0-cdh5.1.0.jar and /usr/lib/hadoop-mapreduce/hadoop-streaming.jar but they both gave same Error message

Here is the command I use again and I will appreciate if someone could explain what the hell is wrong with this

$ /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.3.0-cdh5.1.0.jar -files /tmp/mapper.py,/tmp/reducer.py -mapper /tmp/mapper.py -reducer /tmp/reducer.py -input gutenberg/4300.txt -output output2

Re: Python Streaming

GautamG — Sat, 30 Aug 2014 03:56:12 GMT

The bottom of the stack trace says "Caused by: java.io.IOException: Cannot run program "/tmp/mapper.py": error=2, No such file or directory". How many nodes are there in this cluster? Have you copied the mapper.py program to all the nodes? It also needs to be executable (chmod 755 mapper.py)

Re: Python Streaming

ahegazi — Mon, 01 Sep 2014 03:04:18 GMT

Thanks GutamG for your reply,

The cluster has 10 DataNodes and 1 NameNode

I didn't copy the scripts to all nodes, as I expect which is normal that -files option shall copy them to the HDFS where it is by default reachable via all nodes. I am sure it is executable, I even made it 777.

Re: Python Streaming

GautamG — Mon, 01 Sep 2014 03:35:37 GMT

Please refer to http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/HadoopStreaming.html which shows the correct usage. Use -file for each file to be copied across.

For example, try:

$ /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.3.0-cdh5.1.0.jar -file /tmp/mapper.py -file /tmp/reducer.py -mapper /tmp/mapper.py -reducer /tmp/reducer.py -input gutenberg/4300.txt -output output

Edit: hyperlink and space char in command

Re: Python Streaming

adityahadoop — Fri, 21 Apr 2017 13:50:19 GMT

Hi,

I have similar problem, my python code is working fine when I am running it locally using cat command, but it is not working fine when I am running it on Hadoop. Please find below my code, error, command to run the program and permission on my files:

Permission on my files:

-rwxrwxr-x 1 cloudera cloudera 690 Apr 20 14:15 flight_mapper.py
-rw-r--r-- 1 cloudera cloudera 2865221 Apr 19 08:21 flight_records.csv
-rwxrwxr-x 1 cloudera cloudera 501 Apr 19 13:39 flight_reducer.py
-rwxrwxrwx 1 cloudera cloudera 1349 Apr 21 06:20 framework.py

Framework.py code:

#! /usr/bin/env python
import os
import sys

from itertools import groupby
from operator import itemgetter

separator = "\t"

class Streaming(object):

@staticmethod
def GetJobConf(name):
name = name.replace(".","_").upper()
return os.environ.get(name)

def __init__(self,infile=sys.stdin,separator = separator):
self.infile = infile
self.sep = separator

def Status(self,message):
sys.stderr.write("reporter:status:{}\n".format(message))

def Counter(self,counter,amount=1,group="Python Streaming"):
msg = "reporter:counter:{0},{1},{2}\n".format(group,counter,amount)
sys.stderr.write(msg)

def Emit(self,key,value):
sys.stdout.write("{0}{1}{2}\n".format(key,self.sep,value))

def Read(self):
for line in self.infile:
yield line.rstrip()

def __iter__(self):
for line in self.Read():
yield line

class Mapper(Streaming):

def Map(self):
raise NotImplementedError("Mapper must implement a Map method")

class Reducer(Streaming):

def Reduce(self):
raise NotImplementedError("Reducer must implement a Reduce method")

def __iter__(self):
generator = (line.split(self.sep,1) for line in self.Read())
for item in groupby(generator,itemgetter(0)):
yield item

flight_mapper.py

#! /usr/bin/env python

import sys
import csv
from framework import Mapper

class FlightMapper(Mapper):
def __init__(self,infile=sys.stdin,separator='\t'):
super(FlightMapper,self).__init__(infile,separator)

def Map(self):
reader = csv.reader(self)
for row in reader:
if len(row[3].strip()) == 0:
continue
if len(row[6].strip()) == 0:
row[6] = 0
self.Emit(row[3],row[6])
else:
sys.stdout.write("{0}\t{1}\n").format(row[3],row[6])
self.Emit(row[3],row[6])

if __name__ == '__main__':
mapper = FlightMapper(sys.stdin)
mapper.Map()

flight_reducer.py:

#! /usr/bin/env python
import sys

from framework import Reducer
from itertools import groupby
from operator import itemgetter

class FlightReducer(Reducer):

def Reduce(self):
for key, val in self:
total = 0.0
count = 0
for item in val:
total += float(item[1])
count += 1
self.Emit(key,float(total)/float(count))

if __name__ == '__main__':
reducer = FlightReducer(sys.stdin)
reducer.Reduce()

Error log:

2017-04-21 06:34:14,341 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2017-04-21 06:34:14,411 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2017-04-21 06:34:14,411 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2017-04-21 06:34:14,420 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2017-04-21 06:34:14,420 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1492704251350_0012, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@492e5810)
2017-04-21 06:34:14,496 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2017-04-21 06:34:14,761 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /yarn/nm/usercache/cloudera/appcache/application_1492704251350_0012
2017-04-21 06:34:15,329 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2017-04-21 06:34:15,751 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1
2017-04-21 06:34:15,765 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2017-04-21 06:34:15,955 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: hdfs://quickstart.cloudera:8020/user/cloudera/hadoop_practicals_input/flight_records.csv:1432610+1432611
2017-04-21 06:34:15,982 INFO [main] org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
2017-04-21 06:34:15,996 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 4194300(16777200)
2017-04-21 06:34:15,996 INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 16
2017-04-21 06:34:15,996 INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 13421773
2017-04-21 06:34:15,996 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 16777216
2017-04-21 06:34:15,997 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 4194300; length = 1048576
2017-04-21 06:34:16,000 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2017-04-21 06:34:16,010 INFO [main] org.apache.hadoop.streaming.PipeMapRed: PipeMapRed exec [/usr/bin/python, ./flight_mapper.py]
2017-04-21 06:34:16,016 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
2017-04-21 06:34:16,016 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
2017-04-21 06:34:16,017 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2017-04-21 06:34:16,018 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir
2017-04-21 06:34:16,020 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: map.input.file is deprecated. Instead, use mapreduce.map.input.file
2017-04-21 06:34:16,020 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.cache.localFiles is deprecated. Instead, use mapreduce.job.cache.local.files
2017-04-21 06:34:16,020 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
2017-04-21 06:34:16,021 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: map.input.start is deprecated. Instead, use mapreduce.map.input.start
2017-04-21 06:34:16,022 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
2017-04-21 06:34:16,023 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
2017-04-21 06:34:16,024 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: map.input.length is deprecated. Instead, use mapreduce.map.input.length
2017-04-21 06:34:16,025 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: job.local.dir is deprecated. Instead, use mapreduce.job.local.dir
2017-04-21 06:34:16,025 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.work.output.dir is deprecated. Instead, use mapreduce.task.output.dir
2017-04-21 06:34:16,049 INFO [main] org.apache.hadoop.streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
2017-04-21 06:34:16,049 INFO [main] org.apache.hadoop.streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
2017-04-21 06:34:16,050 INFO [main] org.apache.hadoop.streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s]
2017-04-21 06:34:16,063 INFO [Thread-14] org.apache.hadoop.streaming.PipeMapRed: MRErrorThread done
2017-04-21 06:34:16,068 INFO [main] org.apache.hadoop.streaming.PipeMapRed: R/W/S=1000/0/0 in:NA [rec/s] out:NA [rec/s]
2017-04-21 06:34:16,076 INFO [main] org.apache.hadoop.streaming.PipeMapRed: R/W/S=1751/0/0 in:NA [rec/s] out:NA [rec/s]
minRecWrittenToEnableSkip_=9223372036854775807 HOST=null
USER=cloudera
HADOOP_USER=null
last tool output: |null|

java.io.IOException: Stream closed
at java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:434)
at java.io.OutputStream.write(OutputStream.java:116)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.streaming.io.TextInputWriter.writeUTF8(TextInputWriter.java:72)
at org.apache.hadoop.streaming.io.TextInputWriter.writeValue(TextInputWriter.java:51)
at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:106)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
2017-04-21 06:34:16,079 WARN [main] org.apache.hadoop.streaming.PipeMapRed: java.io.IOException: Stream closed
2017-04-21 06:34:16,079 INFO [main] org.apache.hadoop.streaming.PipeMapRed: PipeMapRed.waitOutputThreads(): subprocess exited with code 1 in org.apache.hadoop.streaming.PipeMapRed
2017-04-21 06:34:16,079 INFO [main] org.apache.hadoop.streaming.PipeMapRed: mapRedFinished
2017-04-21 06:34:16,079 WARN [main] org.apache.hadoop.streaming.PipeMapRed: java.io.IOException: Stream closed
2017-04-21 06:34:16,079 INFO [main] org.apache.hadoop.streaming.PipeMapRed: PipeMapRed.waitOutputThreads(): subprocess exited with code 1 in org.apache.hadoop.streaming.PipeMapRed
2017-04-21 06:34:16,079 INFO [main] org.apache.hadoop.streaming.PipeMapRed: mapRedFinished
2017-04-21 06:34:16,079 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:cloudera (auth:SIMPLE) cause:java.io.IOException: Stream closed
2017-04-21 06:34:16,079 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: Stream closed
at java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:434)
at java.io.OutputStream.write(OutputStream.java:116)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.streaming.io.TextInputWriter.writeUTF8(TextInputWriter.java:72)
at org.apache.hadoop.streaming.io.TextInputWriter.writeValue(TextInputWriter.java:51)
at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:106)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2017-04-21 06:34:16,085 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
2017-04-21 06:34:16,090 WARN [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Could not delete hdfs://quickstart.cloudera:8020/user/cloudera/average_delay/_temporary/1/_temporary/attempt_1492704251350_0012_m_000000_0
2017-04-21 06:34:16,094 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics system...
2017-04-21 06:34:16,094 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system stopped.
2017-04-21 06:34:16,094 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system shutdown complete.

Command:

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -Dstream.non.zero.exit.is.failure=false -input 'hadoop_practicals_input/flight_records.csv' -output average_delay -mapper 'python ./flight_mapper.py' -reducer 'python ./flight_reducer.py' -file ./flight_mapper.py -file ./flight_reducer.py -file ./framework.py

Re: Python Streaming

Vidya821 — Sun, 15 Jul 2018 19:43:18 GMT

@Sean,@Clint,
Can we use mrjob library to execute the mapreduce python code in cloudera quickstart vm ?

Vidya