Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Docker in Hadoop Streaming

Docker in Hadoop Streaming

New Contributor

I'm having trouble getting a docker image with an application running in hadoop streaming as the mapper something similar to:

The docker application and code has been tested and is running on single node cluster in Ubuntu in hadoop 2.7.1 I'm trying to get this application running in cloudera 5.5.1 in hadoop 2.6.  


The hadoop streaming job is:

hadoop jar $HADOOP_HOME/jars/hadoop-streaming-2.6.0-cdh5.5.1.jar\
-D mapred.reduce.tasks=0 \
-D \
-input input \
-output output \
-file \
-mapper ""
/usr/bin/docker run -i mapper_outfirst /opt/

cat somtest | ./


I'm getting the error:

16/02/03 10:30:37 INFO mapreduce.Job: Task Id : attempt_1452850747173_0019_m_000000_0, Status : FAILED

Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1


Looking at /var/log/messages | grep docker it looks like the never launches docker since there are no kernel calls.  I added cloudera-scm, yarn, oozie, mapred, hdfs and flume to the docker group.  The user that submits the job is also in the docker group.  What user launching the   Is there any way to see if the is actually launched in hadoop streaming?  Other hadoop streaming jobs have completed where the mapper was python or a bash script.




Re: Docker in Hadoop Streaming

New Contributor

Wow just read all my typos.  I meant to say that the script was tested using cat sometest | ./


I also meant to ask, "Which user launches the script in hadoop streaming?" 

Don't have an account?
Coming from Hortonworks? Activate your account here