Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Getting error while running mahout job through Oozie

Solved Go to solution

Getting error while running mahout job through Oozie

Explorer

Hi,

 

I am trying to run Mahout job on Reuters dataset. I have executed first 2 steps manually:

 

1 step: mahout org.apache.lucene.benchmark.utils.ExtractReuters reuters_dataset reuters-out

2 step: hdfs dfs -put -f reuters-out /user/cloudera/mahout/kmeans/reuters-out

 

I want to execute 3rd Step using oozie i.e.:

mahout seqdirectory -i /user/cloudera/mahout/kmeans/reuters-out -o /user/cloudera/mahout/kmeans/reuters-out-seqdir -c UTF-8 -chunk 5 -xm mapreduce

 

I have create below mentioned workflow.xml and job.properties for the same:

 

<workflow-app name="My_Workflow" xmlns="uri:oozie:workflow:0.5">
<start to="mahout-testing"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="mahout-testing">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/cloudera/mahout/kmeans/reuters-out-seqdir"/>
</prepare>
<main-class>org.apache.mahout.driver.MahoutDriver</main-class>
<arg>seqdirectory</arg>
<arg>-i /user/cloudera/mahout/kmeans/reuters-out</arg>
<arg>-o /user/cloudera/mahout/kmeans/reuters-out-seqdir</arg>
<arg>-xm mapreduce</arg>
</java>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>

 

job.properties:

oozie.use.system.libpath=True
security_enabled=False
dryrun=False
jobTracker=localhost:8032
nameNode=hdfs://quickstart.cloudera:8020
oozie.wf.application.path=${nameNode}/user/cloudera/app/mahout/kmeans/workflow.xml

 

I have also placed all mahout related jars under /user/cloudera/app/mahout/kmeans/lib.

 

But I am getting below mentioned error:

>>> Invoking Main class now >>>

Fetching child yarn jobs
tag id : oozie-e3010996ec4154408748b70b6f44d85e
Child yarn jobs are found -
Main class : org.apache.mahout.driver.MahoutDriver
Arguments :
seqdirectory
-i /user/cloudera/mahout/kmeans/reuters-out
-o /user/cloudera/mahout/kmeans/reuters-out-seqdir
-xm mapreduce

Unexpected -xm mapreduce while processing Job-Specific Options:
Usage:
[--input <input> --output <output> --overwrite --method <method> --chunkSize
<chunkSize> --fileFilterClass <fileFilterClass> --keyPrefix <keyPrefix>
--charset <charset> --method <method> --overwrite --help --tempDir <tempDir>
--startPhase <startPhase> --endPhase <endPhase>]
Job-Specific Options:
--input (-i) input Path to job input directory.
--output (-o) output The directory pathname for
output.
--overwrite (-ow) If present, overwrite the
output directory before
running job
--method (-xm) method The execution method to use:
sequential or mapreduce.
Default is mapreduce

 

Could you please explain what is the issue here?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Getting error while running mahout job through Oozie

Master Collaborator

I have a guess: you need to make each of those things a separate arg tag? I don't know Oozie well myself, but something similar is needed in Maven config files. That is it may be reading this as one arg not two, called "-xm mapreduce"

1 REPLY 1
Highlighted

Re: Getting error while running mahout job through Oozie

Master Collaborator

I have a guess: you need to make each of those things a separate arg tag? I don't know Oozie well myself, but something similar is needed in Maven config files. That is it may be reading this as one arg not two, called "-xm mapreduce"