New Contributor
Posts: 2
Registered: ‎02-03-2016

Docker in Hadoop Streaming

I'm having trouble getting a docker image with an application running in hadoop streaming as the mapper something similar to:

The docker application and code has been tested and is running on single node cluster in Ubuntu in hadoop 2.7.1 I'm trying to get this application running in cloudera 5.5.1 in hadoop 2.6.  


The hadoop streaming job is:

hadoop jar $HADOOP_HOME/jars/hadoop-streaming-2.6.0-cdh5.5.1.jar\
-D mapred.reduce.tasks=0 \
-D \
-input input \
-output output \
-file \
-mapper ""
/usr/bin/docker run -i mapper_outfirst /opt/

cat somtest | ./


I'm getting the error:

16/02/03 10:30:37 INFO mapreduce.Job: Task Id : attempt_1452850747173_0019_m_000000_0, Status : FAILED

Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1


Looking at /var/log/messages | grep docker it looks like the never launches docker since there are no kernel calls.  I added cloudera-scm, yarn, oozie, mapred, hdfs and flume to the docker group.  The user that submits the job is also in the docker group.  What user launching the   Is there any way to see if the is actually launched in hadoop streaming?  Other hadoop streaming jobs have completed where the mapper was python or a bash script.



New Contributor
Posts: 2
Registered: ‎02-03-2016

Re: Docker in Hadoop Streaming

Wow just read all my typos.  I meant to say that the script was tested using cat sometest | ./


I also meant to ask, "Which user launches the script in hadoop streaming?"