Posts: 25
Registered: ‎07-09-2015
Accepted Solution

Getting error while running mahout job through Oozie

[ Edited ]



I am trying to run Mahout job on Reuters dataset. I have executed first 2 steps manually:


1 step: mahout org.apache.lucene.benchmark.utils.ExtractReuters reuters_dataset reuters-out

2 step: hdfs dfs -put -f reuters-out /user/cloudera/mahout/kmeans/reuters-out


I want to execute 3rd Step using oozie i.e.:

mahout seqdirectory -i /user/cloudera/mahout/kmeans/reuters-out -o /user/cloudera/mahout/kmeans/reuters-out-seqdir -c UTF-8 -chunk 5 -xm mapreduce


I have create below mentioned workflow.xml and for the same:


<workflow-app name="My_Workflow" xmlns="uri:oozie:workflow:0.5">
<start to="mahout-testing"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
<action name="mahout-testing">
<delete path="${nameNode}/user/cloudera/mahout/kmeans/reuters-out-seqdir"/>
<arg>-i /user/cloudera/mahout/kmeans/reuters-out</arg>
<arg>-o /user/cloudera/mahout/kmeans/reuters-out-seqdir</arg>
<arg>-xm mapreduce</arg>
<ok to="End"/>
<error to="Kill"/>
<end name="End"/>



I have also placed all mahout related jars under /user/cloudera/app/mahout/kmeans/lib.


But I am getting below mentioned error:

>>> Invoking Main class now >>>

Fetching child yarn jobs
tag id : oozie-e3010996ec4154408748b70b6f44d85e
Child yarn jobs are found -
Main class : org.apache.mahout.driver.MahoutDriver
Arguments :
-i /user/cloudera/mahout/kmeans/reuters-out
-o /user/cloudera/mahout/kmeans/reuters-out-seqdir
-xm mapreduce

Unexpected -xm mapreduce while processing Job-Specific Options:
[--input <input> --output <output> --overwrite --method <method> --chunkSize
<chunkSize> --fileFilterClass <fileFilterClass> --keyPrefix <keyPrefix>
--charset <charset> --method <method> --overwrite --help --tempDir <tempDir>
--startPhase <startPhase> --endPhase <endPhase>]
Job-Specific Options:
--input (-i) input Path to job input directory.
--output (-o) output The directory pathname for
--overwrite (-ow) If present, overwrite the
output directory before
running job
--method (-xm) method The execution method to use:
sequential or mapreduce.
Default is mapreduce


Could you please explain what is the issue here?

Cloudera Employee
Posts: 481
Registered: ‎08-11-2014

Re: Getting error while running mahout job through Oozie

I have a guess: you need to make each of those things a separate arg tag? I don't know Oozie well myself, but something similar is needed in Maven config files. That is it may be reading this as one arg not two, called "-xm mapreduce"