Created on 06-27-2016 08:45 AM - edited 09-16-2022 03:27 AM
Hi,
I am trying to run Mahout job on Reuters dataset. I have executed first 2 steps manually:
1 step: mahout org.apache.lucene.benchmark.utils.ExtractReuters reuters_dataset reuters-out
2 step: hdfs dfs -put -f reuters-out /user/cloudera/mahout/kmeans/reuters-out
I want to execute 3rd Step using oozie i.e.:
mahout seqdirectory -i /user/cloudera/mahout/kmeans/reuters-out -o /user/cloudera/mahout/kmeans/reuters-out-seqdir -c UTF-8 -chunk 5 -xm mapreduce
I have create below mentioned workflow.xml and job.properties for the same:
<workflow-app name="My_Workflow" xmlns="uri:oozie:workflow:0.5">
<start to="mahout-testing"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="mahout-testing">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/cloudera/mahout/kmeans/reuters-out-seqdir"/>
</prepare>
<main-class>org.apache.mahout.driver.MahoutDriver</main-class>
<arg>seqdirectory</arg>
<arg>-i /user/cloudera/mahout/kmeans/reuters-out</arg>
<arg>-o /user/cloudera/mahout/kmeans/reuters-out-seqdir</arg>
<arg>-xm mapreduce</arg>
</java>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>
job.properties:
oozie.use.system.libpath=True
security_enabled=False
dryrun=False
jobTracker=localhost:8032
nameNode=hdfs://quickstart.cloudera:8020
oozie.wf.application.path=${nameNode}/user/cloudera/app/mahout/kmeans/workflow.xml
I have also placed all mahout related jars under /user/cloudera/app/mahout/kmeans/lib.
But I am getting below mentioned error:
>>> Invoking Main class now >>>
Fetching child yarn jobs
tag id : oozie-e3010996ec4154408748b70b6f44d85e
Child yarn jobs are found -
Main class : org.apache.mahout.driver.MahoutDriver
Arguments :
seqdirectory
-i /user/cloudera/mahout/kmeans/reuters-out
-o /user/cloudera/mahout/kmeans/reuters-out-seqdir
-xm mapreduce
Unexpected -xm mapreduce while processing Job-Specific Options:
Usage:
[--input <input> --output <output> --overwrite --method <method> --chunkSize
<chunkSize> --fileFilterClass <fileFilterClass> --keyPrefix <keyPrefix>
--charset <charset> --method <method> --overwrite --help --tempDir <tempDir>
--startPhase <startPhase> --endPhase <endPhase>]
Job-Specific Options:
--input (-i) input Path to job input directory.
--output (-o) output The directory pathname for
output.
--overwrite (-ow) If present, overwrite the
output directory before
running job
--method (-xm) method The execution method to use:
sequential or mapreduce.
Default is mapreduce
Could you please explain what is the issue here?
Created 06-27-2016 09:28 AM
I have a guess: you need to make each of those things a separate arg tag? I don't know Oozie well myself, but something similar is needed in Maven config files. That is it may be reading this as one arg not two, called "-xm mapreduce"
Created 06-27-2016 09:28 AM
I have a guess: you need to make each of those things a separate arg tag? I don't know Oozie well myself, but something similar is needed in Maven config files. That is it may be reading this as one arg not two, called "-xm mapreduce"