Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Error when using MapReduce Streaming

avatar
Rising Star

I'm following this tutorial:

http://hortonworks.com/blog/using-r-and-other-non-java-languages-in-mapreduce-and-hive/

I put cities.txt in /user/root/ and the R script as following :

#!/usr/bin/env Rscript
f <- file("stdin")
open(f)
state_data = read.table(f)
summary(state_data)

and then run the command:

 hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming-2.7.1.2.3.4.0-3485.jar -input /user/root/cities.txt -output /user/root/streamer -mapper /bin/cat -reducer script.R -numReduceTasks 2 -file script.R 

Map works till 100% and reduce shows this error:

16/03/01 11:06:30 INFO mapreduce.Job:  map 100% reduce 50%
16/03/01 11:06:34 INFO mapreduce.Job: Task Id : attempt_1456773989186_0009_r_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
    at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

Does any one have any idea or encountered this before ?

Thanks.

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Zaher Mahdhi

I was able to reproduce this and now job is running after fixing R script.

R script needs to look like this....Notice the next line between env and Rscript

#!/usr/bin/env
Rscript
f <- file("stdin")
open(f)
state_data = read.table(f)
summary(state_data)
hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming-2.7.1.2.3.4.0-3485.jar  -input cities.txt -output streamout11_r -mapper /bin/cat -reducer script.r -numReduceTasks 2 -file script.r

View solution in original post

2 REPLIES 2

avatar
Master Mentor

@Zaher Mahdhi

I was able to reproduce this and now job is running after fixing R script.

R script needs to look like this....Notice the next line between env and Rscript

#!/usr/bin/env
Rscript
f <- file("stdin")
open(f)
state_data = read.table(f)
summary(state_data)
hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming-2.7.1.2.3.4.0-3485.jar  -input cities.txt -output streamout11_r -mapper /bin/cat -reducer script.r -numReduceTasks 2 -file script.r

avatar
Rising Star

That solved the problem but the Map job stacked, and even after killing it the Yarn container still exists I had to kill it manually. i'll be back to this shortly.