Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Error when using MapReduce Streaming

avatar
Rising Star

I'm following this tutorial:

http://hortonworks.com/blog/using-r-and-other-non-java-languages-in-mapreduce-and-hive/

I put cities.txt in /user/root/ and the R script as following :

#!/usr/bin/env Rscript
f <- file("stdin")
open(f)
state_data = read.table(f)
summary(state_data)

and then run the command:

 hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming-2.7.1.2.3.4.0-3485.jar -input /user/root/cities.txt -output /user/root/streamer -mapper /bin/cat -reducer script.R -numReduceTasks 2 -file script.R 

Map works till 100% and reduce shows this error:

16/03/01 11:06:30 INFO mapreduce.Job:  map 100% reduce 50%
16/03/01 11:06:34 INFO mapreduce.Job: Task Id : attempt_1456773989186_0009_r_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
    at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

Does any one have any idea or encountered this before ?

Thanks.

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Zaher Mahdhi

I was able to reproduce this and now job is running after fixing R script.

R script needs to look like this....Notice the next line between env and Rscript

#!/usr/bin/env
Rscript
f <- file("stdin")
open(f)
state_data = read.table(f)
summary(state_data)
hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming-2.7.1.2.3.4.0-3485.jar  -input cities.txt -output streamout11_r -mapper /bin/cat -reducer script.r -numReduceTasks 2 -file script.r

View solution in original post

2 REPLIES 2

avatar
Master Mentor

@Zaher Mahdhi

I was able to reproduce this and now job is running after fixing R script.

R script needs to look like this....Notice the next line between env and Rscript

#!/usr/bin/env
Rscript
f <- file("stdin")
open(f)
state_data = read.table(f)
summary(state_data)
hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming-2.7.1.2.3.4.0-3485.jar  -input cities.txt -output streamout11_r -mapper /bin/cat -reducer script.r -numReduceTasks 2 -file script.r

avatar
Rising Star

That solved the problem but the Map job stacked, and even after killing it the Yarn container still exists I had to kill it manually. i'll be back to this shortly.