Support Questions
Find answers, ask questions, and share your expertise

How to use Spark engine with Falcon

How to use Spark engine with Falcon


I am using HDP 2.4, Spark 1.6.2.

I've recently installed Falcon and I was able to deploy the primary and backup clusters. I've also successfully run a mirror job.

Now I'm working on scheduling a spark app. When I want to create a process, I am only able to choose from Oozie, Pig and Hive. I am not able to select Spark as an engine. When I try to add it using XML the spark-attributes get cleared.

I am using an xml like below

<process xmlns='uri:falcon:process:0.1' name='spark-process'>
    <cluster name='primaryCluster'>
      <validity start='2017-07-03T00:00Z' end='2017-07-05T00:00Z'/>
  <workflow engine="spark" path="/app/spark"/>
        <name>Test Spark Wordcount</name>
        <spark-opts>--num-executors 1 --driver-memory 512m --executor-memory 512m --executor-cores 1</spark-opts>
    </spark-attributes>  <retry policy='periodic' delay='minutes(3)' attempts='3'/>
  <ACL owner='ambari-qa' group='users' permission='0755'/>

Is there something I need to do before using Spark with Falcon or is this functionality not supported with these component versions?

See screenshots to visualise the issue




Re: How to use Spark engine with Falcon

Just found this error in the falcon.application.log

ERROR - [1388728910@qtp-1886491834-668 - c186eb8d-ef42-42f1-be4b-076e6ee27a5c:ambari-qa:POST//entities/submit/process] ~ Action failed: Bad Request Error: javax.xml.bind.UnmarshalException - with linked exception: [org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 383; cvc-enumeration-valid: Value 'spark' is not facet-valid with respect to enumeration '[oozie, pig, hive]'. It must be a value from the enumeration.] (FalconWebException:83)