Created on 07-31-2013 09:07 PM - edited 09-16-2022 01:46 AM
I'm trying to use my local installation of Cloudera Quickstart VM to do a small mapreduce job in Python.
My test script works when I explicitly add python to the script:
# cat inputfile.txt | python mymapper.py | sort | python myreducer.py
I need to add python to the path in the vm. What's the best way to do this so it finds python from the command line and in Hadoop? I haven't been successful trying to find and modify the right files in the Cloudera VM.
(I was able to run this on AWS. I tried from the hadoop command line also:
hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.3.0.jar \
-input inputfile.txt \
-output output010 \
-mapper mymapper.py \
-file mymapper.py \
-combiner myreducer.py \
-reducer myreducer.py \
-file myreducer.py
... and it fails)
Any help to get the right would be appreciated.
thanks,
jp
Created 08-06-2013 05:36 PM
It took me a while to figure out. I just got it a minute ago.
I was running scripts that I developed in Windows (where end-of-line = cr+lf). I needed to strip out the "cr" so the python interpreter in Linux wouldn't be looking for /usr/bin/env python/r, but /usr/bin/env python.
Now I can move on.
jp
Created 08-14-2013 01:12 PM
Hi,
I have a similar problem. I wrote a simple mapper and reducer to read input file and calculate total number of lines.
This works great locally
cat access.log | ./linecount_mapper.py | ./linecount_reduce.py
Same input files and scripts, when used in streaming returns this error message. Any suggestions?
java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.ja
Thanks,
Charmee
Created 08-14-2013 01:36 PM
This is how I invoke the mapreduce job
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.0.0-cdh4.3.0.jar
-input /user/certification/sandbox/access.log -output /user/certification/sandboxout -mapper /user/certification/sandbox/linecount_mapper.py -reducer /user/certification/sandbox/linecount_reduce.py
I also tried using hadoop-streaming.jar, it gives me the same error as well.
Any suggestions are greatly appreciated.
Created 08-25-2014 06:48 PM
Hi Folks,
Can anyone help me here as well.
I also get same error as follow:
/usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.3.0-cdh5.1.0.jar -files /tmp/mapper.py,/tmp/reducer.py -mapper /tmp/mapper.py -reducer /tmp/reducer.py -input gutenberg/4300.txt -output output2 packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.3.0-cdh5.1.0.jar] /tmp/streamjob725052303650188667.jar tmpDir=null 14/08/26 02:44:06 INFO client.RMProxy: Connecting to ResourceManager at hdmachine1.example.com/128.243.29.224:8032 14/08/26 02:44:06 INFO client.RMProxy: Connecting to ResourceManager at hdmachine1.example.com/128.243.29.224:8032 14/08/26 02:44:06 WARN security.UserGroupInformation: PriviledgedActionException as:hduser (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://hdmachine1.example.com:8020/user/hduser/output2 already exists 14/08/26 02:44:06 WARN security.UserGroupInformation: PriviledgedActionException as:hduser (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://hdmachine1.example.com:8020/user/hduser/output2 already exists 14/08/26 02:44:06 ERROR streaming.StreamJob: Error Launching job : Output directory hdfs://hdmachine1.example.com:8020/user/hduser/output2 already exists Streaming Command Failed! [hduser@hdmachine1 ~]$ /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.3.0-cdh5.1.0.jar -files /tmp/mapper.py,/tmp/reducer.py -mapper /tmp/mapper.py -reducer /tmp/reducer.py -input gutenberg/4300.txt -output op-1 packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.3.0-cdh5.1.0.jar] /tmp/streamjob6895399468399805454.jar tmpDir=null 14/08/26 02:44:21 INFO client.RMProxy: Connecting to ResourceManager at hdmachine1.example.com/128.243.29.224:8032 14/08/26 02:44:21 INFO client.RMProxy: Connecting to ResourceManager at hdmachine1.example.com/128.243.29.224:8032 14/08/26 02:44:22 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library 14/08/26 02:44:22 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 8e266e052e423af592871e2dfe09d54c03f6a0e8] 14/08/26 02:44:22 INFO mapred.FileInputFormat: Total input paths to process : 1 14/08/26 02:44:22 INFO mapreduce.JobSubmitter: number of splits:2 14/08/26 02:44:23 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1409004459008_0013 14/08/26 02:44:23 INFO impl.YarnClientImpl: Submitted application application_1409004459008_0013 14/08/26 02:44:23 INFO mapreduce.Job: The url to track the job: http://hdmachine1.example.com:8088/proxy/application_1409004459008_0013/ 14/08/26 02:44:23 INFO mapreduce.Job: Running job: job_1409004459008_0013 14/08/26 02:44:27 INFO mapreduce.Job: Job job_1409004459008_0013 running in uber mode : false 14/08/26 02:44:27 INFO mapreduce.Job: map 0% reduce 0% 14/08/26 02:44:30 INFO mapreduce.Job: Task Id : attempt_1409004459008_0013_m_000000_0, Status : FAILED Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 17 more Caused by: java.lang.RuntimeException: configuration exception at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222) at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) ... 22 more Caused by: java.io.IOException: Cannot run program "/tmp/mapper.py": error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047) at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23 more Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.<init>(UNIXProcess.java:186) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028) ... 24 more 14/08/26 02:44:31 INFO mapreduce.Job: Task Id : attempt_1409004459008_0013_m_000001_0, Status : FAILED Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 17 more Caused by: java.lang.RuntimeException: configuration exception at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222) at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) ... 22 more Caused by: java.io.IOException: Cannot run program "/tmp/mapper.py": error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047) at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23 more Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.<init>(UNIXProcess.java:186) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028) ... 24 more 14/08/26 02:44:34 INFO mapreduce.Job: Task Id : attempt_1409004459008_0013_m_000001_1, Status : FAILED Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 17 more Caused by: java.lang.RuntimeException: configuration exception at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222) at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) ... 22 more Caused by: java.io.IOException: Cannot run program "/tmp/mapper.py": error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047) at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23 more Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.<init>(UNIXProcess.java:186) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028) ... 24 more 14/08/26 02:44:35 INFO mapreduce.Job: Task Id : attempt_1409004459008_0013_m_000000_1, Status : FAILED Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 17 more Caused by: java.lang.RuntimeException: configuration exception at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222) at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) ... 22 more Caused by: java.io.IOException: Cannot run program "/tmp/mapper.py": error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047) at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23 more Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.<init>(UNIXProcess.java:186) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028) ... 24 more 14/08/26 02:44:38 INFO mapreduce.Job: Task Id : attempt_1409004459008_0013_m_000001_2, Status : FAILED Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 17 more Caused by: java.lang.RuntimeException: configuration exception at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222) at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) ... 22 more Caused by: java.io.IOException: Cannot run program "/tmp/mapper.py": error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047) at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23 more Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.<init>(UNIXProcess.java:186) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028) ... 24 more 14/08/26 02:44:39 INFO mapreduce.Job: Task Id : attempt_1409004459008_0013_m_000000_2, Status : FAILED Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 17 more Caused by: java.lang.RuntimeException: configuration exception at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222) at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) ... 22 more Caused by: java.io.IOException: Cannot run program "/tmp/mapper.py": error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047) at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23 more Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.<init>(UNIXProcess.java:186) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028) ... 24 more 14/08/26 02:44:43 INFO mapreduce.Job: map 100% reduce 100% 14/08/26 02:44:44 INFO mapreduce.Job: Job job_1409004459008_0013 failed with state FAILED due to: Task failed task_1409004459008_0013_m_000000 Job failed as tasks failed. failedMaps:1 failedReduces:0 14/08/26 02:44:44 INFO mapreduce.Job: Counters: 13 Job Counters Failed map tasks=7 Killed map tasks=1 Launched map tasks=8 Other local map tasks=6 Rack-local map tasks=2 Total time spent by all maps in occupied slots (ms)=17357 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=17357 Total vcore-seconds taken by all map tasks=17357 Total megabyte-seconds taken by all map tasks=17773568 Map-Reduce Framework CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0 14/08/26 02:44:44 ERROR streaming.StreamJob: Job not Successful! Streaming Command Failed!
I tried to make the following from all previous posts I saw
1- replace #!/usr/bin/env python3.2 to #!/usr/bin/python3.2
2- I copied the mapper.py and reducer.py to /tmp and made both scripts with 777 ermissions
3- I restarted all hadoop servies
4- I used both streaming jar files /usr/lib/hadoop-mapreduce/hadoop-streaming-2.3.0-cdh5.1.0.jar and /usr/lib/hadoop-mapreduce/hadoop-streaming.jar but they both gave same Error message
Here is the command I use again and I will appreciate if someone could explain what the hell is wrong with this
$ /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.3.0-cdh5.1.0.jar -files /tmp/mapper.py,/tmp/reducer.py -mapper /tmp/mapper.py -reducer /tmp/reducer.py -input gutenberg/4300.txt -output output2
Created 08-29-2014 08:56 PM
Created on 08-31-2014 08:04 PM - edited 08-31-2014 08:04 PM
Thanks GutamG for your reply,
The cluster has 10 DataNodes and 1 NameNode
I didn't copy the scripts to all nodes, as I expect which is normal that -files option shall copy them to the HDFS where it is by default reachable via all nodes. I am sure it is executable, I even made it 777.
Created on 08-31-2014 08:33 PM - edited 08-31-2014 08:35 PM
Please refer to http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/HadoopStr... which shows the correct usage. Use -file for each file to be copied across.
For example, try:
$ /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.3.0-cdh5.1.0.jar -file /tmp/mapper.py -file /tmp/reducer.py -mapper /tmp/mapper.py -reducer /tmp/reducer.py -input gutenberg/4300.txt -output output
Edit: hyperlink and space char in command
Created 08-01-2013 09:22 PM
I renamed my mapper and reducer to jpm.py and jpr.py to make sure my spelling is right. The reducer part of the "cat" doesn't work unless it's preceeded by "python". Then it completes successfully.
In hadoop map-reduce, from the command line, I've gotten the process to complete, but it yields no results. I reduced the reducer functionality to just pass on what comes from the mapper. It completes, but doesn't yield any results in the output (file size = 0). I removed the reducer completely and I get what I expect from the mapper.
I'd like to progress to the gui's and get a taste of pig and hive in cloudera by the end of the month. I think I'm going to try all over again with a fresh vm.
Created on 07-15-2018 12:33 PM - edited 07-15-2018 12:43 PM
@Sean,@Clint,
Can we use mrjob library to execute the mapreduce python code in cloudera quickstart vm ?
Vidya
Created 04-21-2017 06:50 AM
Hi,
I have similar problem, my python code is working fine when I am running it locally using cat command, but it is not working fine when I am running it on Hadoop. Please find below my code, error, command to run the program and permission on my files:
Permission on my files:
-rwxrwxr-x 1 cloudera cloudera 690 Apr 20 14:15 flight_mapper.py
-rw-r--r-- 1 cloudera cloudera 2865221 Apr 19 08:21 flight_records.csv
-rwxrwxr-x 1 cloudera cloudera 501 Apr 19 13:39 flight_reducer.py
-rwxrwxrwx 1 cloudera cloudera 1349 Apr 21 06:20 framework.py
Framework.py code:
#! /usr/bin/env python
import os
import sys
from itertools import groupby
from operator import itemgetter
separator = "\t"
class Streaming(object):
@staticmethod
def GetJobConf(name):
name = name.replace(".","_").upper()
return os.environ.get(name)
def __init__(self,infile=sys.stdin,separator = separator):
self.infile = infile
self.sep = separator
def Status(self,message):
sys.stderr.write("reporter:status:{}\n".format(message))
def Counter(self,counter,amount=1,group="Python Streaming"):
msg = "reporter:counter:{0},{1},{2}\n".format(group,counter,amount)
sys.stderr.write(msg)
def Emit(self,key,value):
sys.stdout.write("{0}{1}{2}\n".format(key,self.sep,value))
def Read(self):
for line in self.infile:
yield line.rstrip()
def __iter__(self):
for line in self.Read():
yield line
class Mapper(Streaming):
def Map(self):
raise NotImplementedError("Mapper must implement a Map method")
class Reducer(Streaming):
def Reduce(self):
raise NotImplementedError("Reducer must implement a Reduce method")
def __iter__(self):
generator = (line.split(self.sep,1) for line in self.Read())
for item in groupby(generator,itemgetter(0)):
yield item
flight_mapper.py
#! /usr/bin/env python
import sys
import csv
from framework import Mapper
class FlightMapper(Mapper):
def __init__(self,infile=sys.stdin,separator='\t'):
super(FlightMapper,self).__init__(infile,separator)
def Map(self):
reader = csv.reader(self)
for row in reader:
if len(row[3].strip()) == 0:
continue
if len(row[6].strip()) == 0:
row[6] = 0
self.Emit(row[3],row[6])
else:
sys.stdout.write("{0}\t{1}\n").format(row[3],row[6])
self.Emit(row[3],row[6])
if __name__ == '__main__':
mapper = FlightMapper(sys.stdin)
mapper.Map()
flight_reducer.py:
#! /usr/bin/env python
import sys
from framework import Reducer
from itertools import groupby
from operator import itemgetter
class FlightReducer(Reducer):
def Reduce(self):
for key, val in self:
total = 0.0
count = 0
for item in val:
total += float(item[1])
count += 1
self.Emit(key,float(total)/float(count))
if __name__ == '__main__':
reducer = FlightReducer(sys.stdin)
reducer.Reduce()
Error log:
2017-04-21 06:34:14,341 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2017-04-21 06:34:14,411 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2017-04-21 06:34:14,411 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2017-04-21 06:34:14,420 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2017-04-21 06:34:14,420 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1492704251350_0012, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@492e5810)
2017-04-21 06:34:14,496 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2017-04-21 06:34:14,761 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /yarn/nm/usercache/cloudera/appcache/application_1492704251350_0012
2017-04-21 06:34:15,329 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2017-04-21 06:34:15,751 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1
2017-04-21 06:34:15,765 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2017-04-21 06:34:15,955 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: hdfs://quickstart.cloudera:8020/user/cloudera/hadoop_practicals_input/flight_records.csv:1432610+1432611
2017-04-21 06:34:15,982 INFO [main] org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
2017-04-21 06:34:15,996 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 4194300(16777200)
2017-04-21 06:34:15,996 INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 16
2017-04-21 06:34:15,996 INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 13421773
2017-04-21 06:34:15,996 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 16777216
2017-04-21 06:34:15,997 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 4194300; length = 1048576
2017-04-21 06:34:16,000 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2017-04-21 06:34:16,010 INFO [main] org.apache.hadoop.streaming.PipeMapRed: PipeMapRed exec [/usr/bin/python, ./flight_mapper.py]
2017-04-21 06:34:16,016 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
2017-04-21 06:34:16,016 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
2017-04-21 06:34:16,017 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2017-04-21 06:34:16,018 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir
2017-04-21 06:34:16,020 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: map.input.file is deprecated. Instead, use mapreduce.map.input.file
2017-04-21 06:34:16,020 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.cache.localFiles is deprecated. Instead, use mapreduce.job.cache.local.files
2017-04-21 06:34:16,020 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
2017-04-21 06:34:16,021 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: map.input.start is deprecated. Instead, use mapreduce.map.input.start
2017-04-21 06:34:16,022 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
2017-04-21 06:34:16,023 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
2017-04-21 06:34:16,024 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: map.input.length is deprecated. Instead, use mapreduce.map.input.length
2017-04-21 06:34:16,025 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: job.local.dir is deprecated. Instead, use mapreduce.job.local.dir
2017-04-21 06:34:16,025 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.work.output.dir is deprecated. Instead, use mapreduce.task.output.dir
2017-04-21 06:34:16,049 INFO [main] org.apache.hadoop.streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
2017-04-21 06:34:16,049 INFO [main] org.apache.hadoop.streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
2017-04-21 06:34:16,050 INFO [main] org.apache.hadoop.streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s]
2017-04-21 06:34:16,063 INFO [Thread-14] org.apache.hadoop.streaming.PipeMapRed: MRErrorThread done
2017-04-21 06:34:16,068 INFO [main] org.apache.hadoop.streaming.PipeMapRed: R/W/S=1000/0/0 in:NA [rec/s] out:NA [rec/s]
2017-04-21 06:34:16,076 INFO [main] org.apache.hadoop.streaming.PipeMapRed: R/W/S=1751/0/0 in:NA [rec/s] out:NA [rec/s]
minRecWrittenToEnableSkip_=9223372036854775807 HOST=null
USER=cloudera
HADOOP_USER=null
last tool output: |null|
java.io.IOException: Stream closed
at java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:434)
at java.io.OutputStream.write(OutputStream.java:116)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.streaming.io.TextInputWriter.writeUTF8(TextInputWriter.java:72)
at org.apache.hadoop.streaming.io.TextInputWriter.writeValue(TextInputWriter.java:51)
at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:106)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
2017-04-21 06:34:16,079 WARN [main] org.apache.hadoop.streaming.PipeMapRed: java.io.IOException: Stream closed
2017-04-21 06:34:16,079 INFO [main] org.apache.hadoop.streaming.PipeMapRed: PipeMapRed.waitOutputThreads(): subprocess exited with code 1 in org.apache.hadoop.streaming.PipeMapRed
2017-04-21 06:34:16,079 INFO [main] org.apache.hadoop.streaming.PipeMapRed: mapRedFinished
2017-04-21 06:34:16,079 WARN [main] org.apache.hadoop.streaming.PipeMapRed: java.io.IOException: Stream closed
2017-04-21 06:34:16,079 INFO [main] org.apache.hadoop.streaming.PipeMapRed: PipeMapRed.waitOutputThreads(): subprocess exited with code 1 in org.apache.hadoop.streaming.PipeMapRed
2017-04-21 06:34:16,079 INFO [main] org.apache.hadoop.streaming.PipeMapRed: mapRedFinished
2017-04-21 06:34:16,079 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:cloudera (auth:SIMPLE) cause:java.io.IOException: Stream closed
2017-04-21 06:34:16,079 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: Stream closed
at java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:434)
at java.io.OutputStream.write(OutputStream.java:116)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.streaming.io.TextInputWriter.writeUTF8(TextInputWriter.java:72)
at org.apache.hadoop.streaming.io.TextInputWriter.writeValue(TextInputWriter.java:51)
at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:106)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
2017-04-21 06:34:16,085 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
2017-04-21 06:34:16,090 WARN [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Could not delete hdfs://quickstart.cloudera:8020/user/cloudera/average_delay/_temporary/1/_temporary/attempt_1492704251350_0012_m_000000_0
2017-04-21 06:34:16,094 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics system...
2017-04-21 06:34:16,094 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system stopped.
2017-04-21 06:34:16,094 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system shutdown complete.
Command:
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -Dstream.non.zero.exit.is.failure=false -input 'hadoop_practicals_input/flight_records.csv' -output average_delay -mapper 'python ./flight_mapper.py' -reducer 'python ./flight_reducer.py' -file ./flight_mapper.py -file ./flight_reducer.py -file ./framework.py