Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Error when using MapReduce Streaming

Solved Go to solution

Error when using MapReduce Streaming

Contributor

I'm following this tutorial:

http://hortonworks.com/blog/using-r-and-other-non-java-languages-in-mapreduce-and-hive/

I put cities.txt in /user/root/ and the R script as following :

#!/usr/bin/env Rscript
f <- file("stdin")
open(f)
state_data = read.table(f)
summary(state_data)

and then run the command:

 hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming-2.7.1.2.3.4.0-3485.jar -input /user/root/cities.txt -output /user/root/streamer -mapper /bin/cat -reducer script.R -numReduceTasks 2 -file script.R 

Map works till 100% and reduce shows this error:

16/03/01 11:06:30 INFO mapreduce.Job:  map 100% reduce 50%
16/03/01 11:06:34 INFO mapreduce.Job: Task Id : attempt_1456773989186_0009_r_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
    at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

Does any one have any idea or encountered this before ?

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Error when using MapReduce Streaming

@Zaher Mahdhi

I was able to reproduce this and now job is running after fixing R script.

R script needs to look like this....Notice the next line between env and Rscript

#!/usr/bin/env
Rscript
f <- file("stdin")
open(f)
state_data = read.table(f)
summary(state_data)
hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming-2.7.1.2.3.4.0-3485.jar  -input cities.txt -output streamout11_r -mapper /bin/cat -reducer script.r -numReduceTasks 2 -file script.r

View solution in original post

2 REPLIES 2
Highlighted

Re: Error when using MapReduce Streaming

@Zaher Mahdhi

I was able to reproduce this and now job is running after fixing R script.

R script needs to look like this....Notice the next line between env and Rscript

#!/usr/bin/env
Rscript
f <- file("stdin")
open(f)
state_data = read.table(f)
summary(state_data)
hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming-2.7.1.2.3.4.0-3485.jar  -input cities.txt -output streamout11_r -mapper /bin/cat -reducer script.r -numReduceTasks 2 -file script.r

View solution in original post

Highlighted

Re: Error when using MapReduce Streaming

Contributor

That solved the problem but the Map job stacked, and even after killing it the Yarn container still exists I had to kill it manually. i'll be back to this shortly.

Don't have an account?
Coming from Hortonworks? Activate your account here