Reply
Highlighted
Explorer
Posts: 23
Registered: ‎07-09-2015
Accepted Solution

Getting error while running mahout job through Oozie

[ Edited ]

Hi,

 

I am trying to run Mahout job on Reuters dataset. I have executed first 2 steps manually:

 

1 step: mahout org.apache.lucene.benchmark.utils.ExtractReuters reuters_dataset reuters-out

2 step: hdfs dfs -put -f reuters-out /user/cloudera/mahout/kmeans/reuters-out

 

I want to execute 3rd Step using oozie i.e.:

mahout seqdirectory -i /user/cloudera/mahout/kmeans/reuters-out -o /user/cloudera/mahout/kmeans/reuters-out-seqdir -c UTF-8 -chunk 5 -xm mapreduce

 

I have create below mentioned workflow.xml and job.properties for the same:

 

<workflow-app name="My_Workflow" xmlns="uri:oozie:workflow:0.5">
<start to="mahout-testing"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="mahout-testing">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/cloudera/mahout/kmeans/reuters-out-seqdir"/>
</prepare>
<main-class>org.apache.mahout.driver.MahoutDriver</main-class>
<arg>seqdirectory</arg>
<arg>-i /user/cloudera/mahout/kmeans/reuters-out</arg>
<arg>-o /user/cloudera/mahout/kmeans/reuters-out-seqdir</arg>
<arg>-xm mapreduce</arg>
</java>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>

 

job.properties:

oozie.use.system.libpath=True
security_enabled=False
dryrun=False
jobTracker=localhost:8032
nameNode=hdfs://quickstart.cloudera:8020
oozie.wf.application.path=${nameNode}/user/cloudera/app/mahout/kmeans/workflow.xml

 

I have also placed all mahout related jars under /user/cloudera/app/mahout/kmeans/lib.

 

But I am getting below mentioned error:

>>> Invoking Main class now >>>

Fetching child yarn jobs
tag id : oozie-e3010996ec4154408748b70b6f44d85e
Child yarn jobs are found -
Main class : org.apache.mahout.driver.MahoutDriver
Arguments :
seqdirectory
-i /user/cloudera/mahout/kmeans/reuters-out
-o /user/cloudera/mahout/kmeans/reuters-out-seqdir
-xm mapreduce

Unexpected -xm mapreduce while processing Job-Specific Options:
Usage:
[--input <input> --output <output> --overwrite --method <method> --chunkSize
<chunkSize> --fileFilterClass <fileFilterClass> --keyPrefix <keyPrefix>
--charset <charset> --method <method> --overwrite --help --tempDir <tempDir>
--startPhase <startPhase> --endPhase <endPhase>]
Job-Specific Options:
--input (-i) input Path to job input directory.
--output (-o) output The directory pathname for
output.
--overwrite (-ow) If present, overwrite the
output directory before
running job
--method (-xm) method The execution method to use:
sequential or mapreduce.
Default is mapreduce

 

Could you please explain what is the issue here?

Cloudera Employee
Posts: 445
Registered: ‎08-11-2014

Re: Getting error while running mahout job through Oozie

I have a guess: you need to make each of those things a separate arg tag? I don't know Oozie well myself, but something similar is needed in Maven config files. That is it may be reading this as one arg not two, called "-xm mapreduce"

Announcements