Member since
07-09-2015
26
Posts
0
Kudos Received
0
Solutions
06-27-2016
08:45 AM
Hi, I am trying to run Mahout job on Reuters dataset. I have executed first 2 steps manually: 1 step: mahout org.apache.lucene.benchmark.utils.ExtractReuters reuters_dataset reuters-out 2 step: hdfs dfs -put -f reuters-out /user/cloudera/mahout/kmeans/reuters-out I want to execute 3rd Step using oozie i.e.: mahout seqdirectory -i /user/cloudera/mahout/kmeans/reuters-out -o /user/cloudera/mahout/kmeans/reuters-out-seqdir -c UTF-8 -chunk 5 -xm mapreduce I have create below mentioned workflow.xml and job.properties for the same: <workflow-app name="My_Workflow" xmlns="uri:oozie:workflow:0.5"> <start to="mahout-testing"/> <kill name="Kill"> <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <action name="mahout-testing"> <java> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/user/cloudera/mahout/kmeans/reuters-out-seqdir"/> </prepare> <main-class>org.apache.mahout.driver.MahoutDriver</main-class> <arg>seqdirectory</arg> <arg>-i /user/cloudera/mahout/kmeans/reuters-out</arg> <arg>-o /user/cloudera/mahout/kmeans/reuters-out-seqdir</arg> <arg>-xm mapreduce</arg> </java> <ok to="End"/> <error to="Kill"/> </action> <end name="End"/> </workflow-app> job.properties: oozie.use.system.libpath=True security_enabled=False dryrun=False jobTracker=localhost:8032 nameNode=hdfs://quickstart.cloudera:8020 oozie.wf.application.path=${nameNode}/user/cloudera/app/mahout/kmeans/workflow.xml I have also placed all mahout related jars under /user/cloudera/app/mahout/kmeans/lib. But I am getting below mentioned error: >>> Invoking Main class now >>> Fetching child yarn jobs tag id : oozie-e3010996ec4154408748b70b6f44d85e Child yarn jobs are found - Main class : org.apache.mahout.driver.MahoutDriver Arguments : seqdirectory -i /user/cloudera/mahout/kmeans/reuters-out -o /user/cloudera/mahout/kmeans/reuters-out-seqdir -xm mapreduce Unexpected -xm mapreduce while processing Job-Specific Options: Usage: [--input <input> --output <output> --overwrite --method <method> --chunkSize <chunkSize> --fileFilterClass <fileFilterClass> --keyPrefix <keyPrefix> --charset <charset> --method <method> --overwrite --help --tempDir <tempDir> --startPhase <startPhase> --endPhase <endPhase>] Job-Specific Options: --input (-i) input Path to job input directory. --output (-o) output The directory pathname for output. --overwrite (-ow) If present, overwrite the output directory before running job --method (-xm) method The execution method to use: sequential or mapreduce. Default is mapreduce Could you please explain what is the issue here?
... View more
Labels:
- Labels:
-
Apache Oozie
01-28-2016
06:27 AM
Hi, How to find out the number of MapReduce jobs for single Hive query? Could you please let me know if I execute below query how many mapreduce jobs will be launched and the squence of these jobs. select col1, col2, sum(col3), count(col4), avg(col5) from sample_table where col1=condition; Thanks in advance
... View more
Labels:
- Labels:
-
Apache Hive
-
MapReduce
07-28-2015
06:43 AM
Hi Harsh, The information provided for Java Action is clear to me. But I have doubt regarding MapReduce action. I am using only Mapreduce action for my workflow and I am placing only my Mapper class & Reducer Class in the .jar file. I want to pass all the properties and parameters through Oozie workflow. Now can I get the value of the "var" variable, i.e. "2", in my Mapper using the below code in my mapper class. protected void setup(Context context) throws IOException,InterruptedException { Configuration conf = context.getConfiguration(); var_val = conf.get("var"); }
... View more
07-27-2015
02:58 AM
Hi, I am running my oozie workflow through command prompt. I assigning value to a parameter "var" in my my workflow. <property> <name>var</name> <value>2</value> </property> Now how can i access the variable value in my java Mapper Program. Will the below metioned work for me: protected void setup(Context context) throws IOException,InterruptedException { Configuration conf = context.getConfiguration(); var_val = conf.get("var"); }
... View more
Labels:
- Labels:
-
Apache Oozie
07-16-2015
07:10 AM
Hi, Your reply does help. Thanks. But I still have doubts regarding passing parameters via Hue UI. 1.If I pass the parameter value using the Properties field, then will it be updated in the workflow under <agr> tag or <property> tag? 2. What is the use of Parameter field under Workflow Settings. Also could you please give some insight regarding my use-case: I want to run the java mapreduce program using mapreduce action. I have a jar with multiple Mapper and Reducer Class defined and I want to use Oozie as driver for my Mapper and Reducer programs. And at the same time passing argument values through Oozie to my Mappers and Reducers. How can I achieve this using Hue UI to create and workflow?
... View more
07-16-2015
06:34 AM
Hi, I have a requirement where I need to share data between multiple mapreduce actions in a single workflow and between different workflows within single coordinator. I am creating and running my workflows using Hue. I have read about capture-output element, but this element is not available under mapreduce action. Even if i use Java action to run my mapreduce programs, The capture-element has size constraint of 2KB by default. Could you please let me know how much I can increase the size? Thanks
... View more
Labels:
- Labels:
-
Apache Oozie
-
Cloudera Hue
-
MapReduce
07-16-2015
06:02 AM
Hi, Thanks for the reply. I am trying to use Hue for creating and running the Oozie workflows. Whatever Properties that I am trying to pass (eg: mapreduce.map.class and mapreduce.reduce.class) is not getting written to job.properties file. Hence could see those in workflow.xml as well. I have few Qs: Where should define parameter and its value in Hue UI for a workflow? Do I need to upload .jar in lib directory of the workflow and what is the need of that? My Use-case: I want to run the java mapreduce program using mapreduce action. I have a jar with multiple Mapper and Reducer Class defined and I want to use Oozie as driver for my Mapper and Reducer programs. And at the same time passing argument values through Oozie to my Mappers and Reducers. Could you please let me know how to achieve this using Hue and Oozie? Thanks
... View more
07-10-2015
05:29 AM
Hi, I am new Oozie and some doubts regarding the creating workflows. I have java mapreduce progrm which expects 3 parameters to be passed: input directory, output directory and the field position of the record in the file that I need to pass to the program as parameter. hadoop jar /home/cloudera/ooziesample.jar /user/cloudera/wordcount/input/ /user/cloudera/wordcount/output 2 Now I want to create a workflow for running this jar as mapreduce actions. I want to pass these 3 parameters via oozie workflow : input directory, output directory and the field position of the record. Could you please help me know how to do the same
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Oozie
-
MapReduce