Support Questions

Find answers, ask questions, and share your expertise

Python Streaming

avatar
Contributor

I'm trying to use my local installation of Cloudera Quickstart VM to do a small mapreduce job in Python.

My test script works when I explicitly add python to the script:

  # cat inputfile.txt | python mymapper.py | sort | python myreducer.py

I need to add python to the path in the vm.  What's the best way to do this so it finds python from the command line and in Hadoop?  I haven't been successful trying to find and modify the right files in the Cloudera VM.

(I was able to run this on AWS.  I tried from the hadoop command line also:

hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.3.0.jar \

-input inputfile.txt \

-output output010 \

-mapper mymapper.py \

-file mymapper.py \

-combiner myreducer.py \

-reducer myreducer.py \

-file myreducer.py

 ... and it fails)

Any help to get the  right would be appreciated.

thanks,

jp

1 ACCEPTED SOLUTION

avatar
Contributor

It took me a while to figure out.  I just got it a minute ago. 

I was running scripts that I developed in Windows (where end-of-line = cr+lf).  I needed to strip out the "cr" so the python interpreter in Linux wouldn't be looking for /usr/bin/env python/r, but /usr/bin/env python. 

Now I can move on.

 

jp

View solution in original post

18 REPLIES 18

avatar
Explorer

Hi,

 

I have a similar problem. I wrote a simple mapper and reducer to read input file and calculate total number of lines. 

 

This works great locally

cat access.log | ./linecount_mapper.py | ./linecount_reduce.py

 

Same input files and scripts, when used in streaming returns this error message. Any suggestions?

 

java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.ja

 

Thanks,

Charmee

avatar
Explorer

This is how I invoke the mapreduce job

 

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.0.0-cdh4.3.0.jar

-input /user/certification/sandbox/access.log -output /user/certification/sandboxout -mapper /user/certification/sandbox/linecount_mapper.py -reducer /user/certification/sandbox/linecount_reduce.py

 

I also tried using hadoop-streaming.jar, it gives me the same error as well.

 

Any suggestions are greatly appreciated.

avatar
Explorer

Hi Folks,

 

Can anyone help me here as well.

 

I also get same error as follow:

 /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.3.0-cdh5.1.0.jar -files /tmp/mapper.py,/tmp/reducer.py -mapper /tmp/mapper.py -reducer /tmp/reducer.py -input gutenberg/4300.txt -output output2
packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.3.0-cdh5.1.0.jar] /tmp/streamjob725052303650188667.jar tmpDir=null
14/08/26 02:44:06 INFO client.RMProxy: Connecting to ResourceManager at hdmachine1.example.com/128.243.29.224:8032
14/08/26 02:44:06 INFO client.RMProxy: Connecting to ResourceManager at hdmachine1.example.com/128.243.29.224:8032
14/08/26 02:44:06 WARN security.UserGroupInformation: PriviledgedActionException as:hduser (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://hdmachine1.example.com:8020/user/hduser/output2 already exists
14/08/26 02:44:06 WARN security.UserGroupInformation: PriviledgedActionException as:hduser (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://hdmachine1.example.com:8020/user/hduser/output2 already exists
14/08/26 02:44:06 ERROR streaming.StreamJob: Error Launching job : Output directory hdfs://hdmachine1.example.com:8020/user/hduser/output2 already exists
Streaming Command Failed!
[hduser@hdmachine1 ~]$ /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.3.0-cdh5.1.0.jar -files /tmp/mapper.py,/tmp/reducer.py -mapper /tmp/mapper.py -reducer /tmp/reducer.py -input gutenberg/4300.txt -output op-1
packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.3.0-cdh5.1.0.jar] /tmp/streamjob6895399468399805454.jar tmpDir=null
14/08/26 02:44:21 INFO client.RMProxy: Connecting to ResourceManager at hdmachine1.example.com/128.243.29.224:8032
14/08/26 02:44:21 INFO client.RMProxy: Connecting to ResourceManager at hdmachine1.example.com/128.243.29.224:8032
14/08/26 02:44:22 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
14/08/26 02:44:22 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 8e266e052e423af592871e2dfe09d54c03f6a0e8]
14/08/26 02:44:22 INFO mapred.FileInputFormat: Total input paths to process : 1
14/08/26 02:44:22 INFO mapreduce.JobSubmitter: number of splits:2
14/08/26 02:44:23 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1409004459008_0013
14/08/26 02:44:23 INFO impl.YarnClientImpl: Submitted application application_1409004459008_0013
14/08/26 02:44:23 INFO mapreduce.Job: The url to track the job: http://hdmachine1.example.com:8088/proxy/application_1409004459008_0013/
14/08/26 02:44:23 INFO mapreduce.Job: Running job: job_1409004459008_0013
14/08/26 02:44:27 INFO mapreduce.Job: Job job_1409004459008_0013 running in uber mode : false
14/08/26 02:44:27 INFO mapreduce.Job:  map 0% reduce 0%
14/08/26 02:44:30 INFO mapreduce.Job: Task Id : attempt_1409004459008_0013_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
	... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
	... 14 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
	... 17 more
Caused by: java.lang.RuntimeException: configuration exception
	at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
	at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
	... 22 more
Caused by: java.io.IOException: Cannot run program "/tmp/mapper.py": error=2, No such file or directory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
	at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
	... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.<init>(UNIXProcess.java:186)
	at java.lang.ProcessImpl.start(ProcessImpl.java:130)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
	... 24 more

14/08/26 02:44:31 INFO mapreduce.Job: Task Id : attempt_1409004459008_0013_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
	... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
	... 14 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
	... 17 more
Caused by: java.lang.RuntimeException: configuration exception
	at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
	at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
	... 22 more
Caused by: java.io.IOException: Cannot run program "/tmp/mapper.py": error=2, No such file or directory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
	at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
	... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.<init>(UNIXProcess.java:186)
	at java.lang.ProcessImpl.start(ProcessImpl.java:130)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
	... 24 more

14/08/26 02:44:34 INFO mapreduce.Job: Task Id : attempt_1409004459008_0013_m_000001_1, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
	... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
	... 14 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
	... 17 more
Caused by: java.lang.RuntimeException: configuration exception
	at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
	at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
	... 22 more
Caused by: java.io.IOException: Cannot run program "/tmp/mapper.py": error=2, No such file or directory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
	at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
	... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.<init>(UNIXProcess.java:186)
	at java.lang.ProcessImpl.start(ProcessImpl.java:130)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
	... 24 more

14/08/26 02:44:35 INFO mapreduce.Job: Task Id : attempt_1409004459008_0013_m_000000_1, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
	... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
	... 14 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
	... 17 more
Caused by: java.lang.RuntimeException: configuration exception
	at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
	at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
	... 22 more
Caused by: java.io.IOException: Cannot run program "/tmp/mapper.py": error=2, No such file or directory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
	at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
	... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.<init>(UNIXProcess.java:186)
	at java.lang.ProcessImpl.start(ProcessImpl.java:130)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
	... 24 more

14/08/26 02:44:38 INFO mapreduce.Job: Task Id : attempt_1409004459008_0013_m_000001_2, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
	... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
	... 14 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
	... 17 more
Caused by: java.lang.RuntimeException: configuration exception
	at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
	at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
	... 22 more
Caused by: java.io.IOException: Cannot run program "/tmp/mapper.py": error=2, No such file or directory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
	at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
	... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.<init>(UNIXProcess.java:186)
	at java.lang.ProcessImpl.start(ProcessImpl.java:130)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
	... 24 more

14/08/26 02:44:39 INFO mapreduce.Job: Task Id : attempt_1409004459008_0013_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
	... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
	at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
	... 14 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
	... 17 more
Caused by: java.lang.RuntimeException: configuration exception
	at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
	at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
	... 22 more
Caused by: java.io.IOException: Cannot run program "/tmp/mapper.py": error=2, No such file or directory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
	at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
	... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.<init>(UNIXProcess.java:186)
	at java.lang.ProcessImpl.start(ProcessImpl.java:130)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
	... 24 more

14/08/26 02:44:43 INFO mapreduce.Job:  map 100% reduce 100%
14/08/26 02:44:44 INFO mapreduce.Job: Job job_1409004459008_0013 failed with state FAILED due to: Task failed task_1409004459008_0013_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

14/08/26 02:44:44 INFO mapreduce.Job: Counters: 13
	Job Counters 
		Failed map tasks=7
		Killed map tasks=1
		Launched map tasks=8
		Other local map tasks=6
		Rack-local map tasks=2
		Total time spent by all maps in occupied slots (ms)=17357
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=17357
		Total vcore-seconds taken by all map tasks=17357
		Total megabyte-seconds taken by all map tasks=17773568
	Map-Reduce Framework
		CPU time spent (ms)=0
		Physical memory (bytes) snapshot=0
		Virtual memory (bytes) snapshot=0
14/08/26 02:44:44 ERROR streaming.StreamJob: Job not Successful!
Streaming Command Failed!

 

I tried to make the following from all previous posts I saw 

1- replace   #!/usr/bin/env python3.2  to  #!/usr/bin/python3.2

2- I copied the mapper.py and reducer.py to /tmp and made both scripts with 777 ermissions

3- I restarted all hadoop servies

4- I used both streaming jar files  /usr/lib/hadoop-mapreduce/hadoop-streaming-2.3.0-cdh5.1.0.jar  and /usr/lib/hadoop-mapreduce/hadoop-streaming.jar   but they both gave same Error message

 

Here is the command I use again and I will appreciate if someone could explain what the hell is wrong with this

$ /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.3.0-cdh5.1.0.jar -files /tmp/mapper.py,/tmp/reducer.py -mapper /tmp/mapper.py -reducer /tmp/reducer.py -input gutenberg/4300.txt -output output2

 

avatar
The bottom of the stack trace says "Caused by: java.io.IOException: Cannot run program "/tmp/mapper.py": error=2, No such file or directory". How many nodes are there in this cluster? Have you copied the mapper.py program to all the nodes? It also needs to be executable (chmod 755 mapper.py)
Regards,
Gautam Gopalakrishnan

avatar
Explorer

Thanks GutamG for your reply,

 

 

The cluster has 10 DataNodes and 1 NameNode

I didn't copy the scripts to all nodes, as I expect which is normal that -files option shall copy them to the HDFS where it is by default reachable via all nodes. I am sure it is executable, I even made it 777.

 

 

avatar

Please refer to http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/HadoopStr... which shows the correct usage. Use -file for each file to be copied across.

For example, try:

 

$ /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.3.0-cdh5.1.0.jar -file /tmp/mapper.py -file /tmp/reducer.py -mapper /tmp/mapper.py -reducer /tmp/reducer.py -input gutenberg/4300.txt -output output

 

Edit: hyperlink and space char in command

Regards,
Gautam Gopalakrishnan

avatar
Contributor

I renamed my mapper and reducer to jpm.py and jpr.py to make sure my spelling is right.  The reducer part of the "cat" doesn't work unless it's preceeded by "python".  Then it completes successfully.

In hadoop map-reduce, from the command line, I've gotten the process to complete, but it yields no results.  I reduced the reducer functionality to just pass on what comes from the mapper.  It completes, but doesn't yield any results in the output (file size = 0).  I removed the reducer completely and I get what I expect from the mapper.

I'd like to progress to the gui's and get a taste of pig and hive in cloudera by the end of the month.  I think I'm going to try all over again with a fresh vm.

 

avatar
Explorer

@Sean,@Clint,
Can we use mrjob library to execute the mapreduce python code in cloudera quickstart vm ?

 

Vidya

avatar

Hi,

 

I have similar problem, my python code is working fine when I am running it locally using cat command, but it is not working fine when I am running it on Hadoop. Please find below my code, error, command to run the program and permission on my files:

 

Permission on my files:

 

-rwxrwxr-x 1 cloudera cloudera 690 Apr 20 14:15 flight_mapper.py
-rw-r--r-- 1 cloudera cloudera 2865221 Apr 19 08:21 flight_records.csv
-rwxrwxr-x 1 cloudera cloudera 501 Apr 19 13:39 flight_reducer.py
-rwxrwxrwx 1 cloudera cloudera 1349 Apr 21 06:20 framework.py

 

Framework.py code:

 

#! /usr/bin/env python
import os
import sys

from itertools import groupby
from operator import itemgetter

separator = "\t"

class Streaming(object):

@staticmethod
def GetJobConf(name):
name = name.replace(".","_").upper()
return os.environ.get(name)


def __init__(self,infile=sys.stdin,separator = separator):
self.infile = infile
self.sep = separator

def Status(self,message):
sys.stderr.write("reporter:status:{}\n".format(message))

def Counter(self,counter,amount=1,group="Python Streaming"):
msg = "reporter:counter:{0},{1},{2}\n".format(group,counter,amount)
sys.stderr.write(msg)

def Emit(self,key,value):
sys.stdout.write("{0}{1}{2}\n".format(key,self.sep,value))

def Read(self):
for line in self.infile:
yield line.rstrip()

def __iter__(self):
for line in self.Read():
yield line


class Mapper(Streaming):

def Map(self):
raise NotImplementedError("Mapper must implement a Map method")

class Reducer(Streaming):

def Reduce(self):
raise NotImplementedError("Reducer must implement a Reduce method")

def __iter__(self):
generator = (line.split(self.sep,1) for line in self.Read())
for item in groupby(generator,itemgetter(0)):
yield item

 

flight_mapper.py

 

 

#! /usr/bin/env python

import sys
import csv
from framework import Mapper

class FlightMapper(Mapper):
def __init__(self,infile=sys.stdin,separator='\t'):
super(FlightMapper,self).__init__(infile,separator)

def Map(self):
reader = csv.reader(self)
for row in reader:
if len(row[3].strip()) == 0:
continue
if len(row[6].strip()) == 0:
row[6] = 0
self.Emit(row[3],row[6])
else:
sys.stdout.write("{0}\t{1}\n").format(row[3],row[6])
self.Emit(row[3],row[6])

if __name__ == '__main__':
mapper = FlightMapper(sys.stdin)
mapper.Map()

 

flight_reducer.py:

 

#! /usr/bin/env python
import sys

from framework import Reducer
from itertools import groupby
from operator import itemgetter

class FlightReducer(Reducer):

def Reduce(self):
for key, val in self:
total = 0.0
count = 0
for item in val:
total += float(item[1])
count += 1
self.Emit(key,float(total)/float(count))

if __name__ == '__main__':
reducer = FlightReducer(sys.stdin)
reducer.Reduce()

 

Error log:

 

2017-04-21 06:34:14,341 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2017-04-21 06:34:14,411 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2017-04-21 06:34:14,411 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2017-04-21 06:34:14,420 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2017-04-21 06:34:14,420 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1492704251350_0012, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@492e5810)
2017-04-21 06:34:14,496 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2017-04-21 06:34:14,761 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /yarn/nm/usercache/cloudera/appcache/application_1492704251350_0012
2017-04-21 06:34:15,329 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2017-04-21 06:34:15,751 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1
2017-04-21 06:34:15,765 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2017-04-21 06:34:15,955 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: hdfs://quickstart.cloudera:8020/user/cloudera/hadoop_practicals_input/flight_records.csv:1432610+1432611
2017-04-21 06:34:15,982 INFO [main] org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
2017-04-21 06:34:15,996 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 4194300(16777200)
2017-04-21 06:34:15,996 INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 16
2017-04-21 06:34:15,996 INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 13421773
2017-04-21 06:34:15,996 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 16777216
2017-04-21 06:34:15,997 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 4194300; length = 1048576
2017-04-21 06:34:16,000 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2017-04-21 06:34:16,010 INFO [main] org.apache.hadoop.streaming.PipeMapRed: PipeMapRed exec [/usr/bin/python, ./flight_mapper.py]
2017-04-21 06:34:16,016 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
2017-04-21 06:34:16,016 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
2017-04-21 06:34:16,017 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2017-04-21 06:34:16,018 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir
2017-04-21 06:34:16,020 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: map.input.file is deprecated. Instead, use mapreduce.map.input.file
2017-04-21 06:34:16,020 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.cache.localFiles is deprecated. Instead, use mapreduce.job.cache.local.files
2017-04-21 06:34:16,020 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
2017-04-21 06:34:16,021 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: map.input.start is deprecated. Instead, use mapreduce.map.input.start
2017-04-21 06:34:16,022 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
2017-04-21 06:34:16,023 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
2017-04-21 06:34:16,024 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: map.input.length is deprecated. Instead, use mapreduce.map.input.length
2017-04-21 06:34:16,025 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: job.local.dir is deprecated. Instead, use mapreduce.job.local.dir
2017-04-21 06:34:16,025 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.work.output.dir is deprecated. Instead, use mapreduce.task.output.dir
2017-04-21 06:34:16,049 INFO [main] org.apache.hadoop.streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
2017-04-21 06:34:16,049 INFO [main] org.apache.hadoop.streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
2017-04-21 06:34:16,050 INFO [main] org.apache.hadoop.streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s]
2017-04-21 06:34:16,063 INFO [Thread-14] org.apache.hadoop.streaming.PipeMapRed: MRErrorThread done
2017-04-21 06:34:16,068 INFO [main] org.apache.hadoop.streaming.PipeMapRed: R/W/S=1000/0/0 in:NA [rec/s] out:NA [rec/s]
2017-04-21 06:34:16,076 INFO [main] org.apache.hadoop.streaming.PipeMapRed: R/W/S=1751/0/0 in:NA [rec/s] out:NA [rec/s]
minRecWrittenToEnableSkip_=9223372036854775807 HOST=null
USER=cloudera
HADOOP_USER=null
last tool output: |null|

java.io.IOException: Stream closed
at java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:434)
at java.io.OutputStream.write(OutputStream.java:116)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.streaming.io.TextInputWriter.writeUTF8(TextInputWriter.java:72)
at org.apache.hadoop.streaming.io.TextInputWriter.writeValue(TextInputWriter.java:51)
at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:106)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
2017-04-21 06:34:16,079 WARN [main] org.apache.hadoop.streaming.PipeMapRed: java.io.IOException: Stream closed
2017-04-21 06:34:16,079 INFO [main] org.apache.hadoop.streaming.PipeMapRed: PipeMapRed.waitOutputThreads(): subprocess exited with code 1 in org.apache.hadoop.streaming.PipeMapRed
2017-04-21 06:34:16,079 INFO [main] org.apache.hadoop.streaming.PipeMapRed: mapRedFinished
2017-04-21 06:34:16,079 WARN [main] org.apache.hadoop.streaming.PipeMapRed: java.io.IOException: Stream closed
2017-04-21 06:34:16,079 INFO [main] org.apache.hadoop.streaming.PipeMapRed: PipeMapRed.waitOutputThreads(): subprocess exited with code 1 in org.apache.hadoop.streaming.PipeMapRed
2017-04-21 06:34:16,079 INFO [main] org.apache.hadoop.streaming.PipeMapRed: mapRedFinished
2017-04-21 06:34:16,079 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:cloudera (auth:SIMPLE) cause:java.io.IOException: Stream closed
2017-04-21 06:34:16,079 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: Stream closed
at java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:434)
at java.io.OutputStream.write(OutputStream.java:116)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.streaming.io.TextInputWriter.writeUTF8(TextInputWriter.java:72)
at org.apache.hadoop.streaming.io.TextInputWriter.writeValue(TextInputWriter.java:51)
at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:106)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2017-04-21 06:34:16,085 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
2017-04-21 06:34:16,090 WARN [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Could not delete hdfs://quickstart.cloudera:8020/user/cloudera/average_delay/_temporary/1/_temporary/attempt_1492704251350_0012_m_000000_0
2017-04-21 06:34:16,094 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics system...
2017-04-21 06:34:16,094 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system stopped.
2017-04-21 06:34:16,094 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system shutdown complete.

 

 

Command:

 

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -Dstream.non.zero.exit.is.failure=false -input 'hadoop_practicals_input/flight_records.csv' -output average_delay -mapper 'python ./flight_mapper.py' -reducer 'python ./flight_reducer.py' -file ./flight_mapper.py -file ./flight_reducer.py -file ./framework.py