Created 06-07-2016 11:23 PM
I used the Falcon UI to create a input feed and output feed. Both are hdfs directories with the input directory (inputDir) populated by flume and the ouput directory (outputDir) to be populated by my oozie workflow shell-action.
Next I created a Falcon process which references my oozie workflow.xml file and I added a input with a name inputDir and output with the name outputDir.
below is the shell-action I am using in my oozie workflow.xml
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>/usr/hdp/current/spark-client/bin/spark-submit</exec>
<argument>--master</argument>
<argument>yarn-client</argument>
<argument>wordCountWArgs.py</argument>
<argument>${inputDir}</argument>
<argument>${outputDir}</argument>
<env-var>HADOOP_USER_NAME=falcon</env-var>
<file>${nameNode}/user/falcon/apps/scripts/wordCountWArgs.py#wordCountWArgs.py</file>
<capture-output/>
</shell>
the job fails and I can see a message in the log that it couldn't reference ${inputDir} and ${outputDir}
Does anyone have any idea of where I am going wrong?
Thanks in advance.
Created 06-13-2016 08:09 PM
Oozie cannot directly reference Falcon objects. To use Oozie with Falcon, create an Input Feed and Output feed and a Falcon Oozie process. Falcon will then generate the Oozie workflow XML.
Created 06-13-2016 08:09 PM
Oozie cannot directly reference Falcon objects. To use Oozie with Falcon, create an Input Feed and Output feed and a Falcon Oozie process. Falcon will then generate the Oozie workflow XML.