Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Problem running spark with oozie

Problem running spark with oozie

New Contributor

I am running a spark streaming application inside oozie. It runs good in dev environment where the spark version is 1.3.0. But in other region it fails with below error. 

2016-04-27 05:39:58,142 ERROR [main] Error starting MRAppMaster
java.lang.IllegalArgumentException: Invalid ContainerId: container_e31_1461603486630_0010_01_000001

This region has 1.5 version of spark. The CDH version is 5.5.0. I understand it is something to do with the spark version but the oozie share lib's spark-core jar version is also 1.5.Below is my workflow.xml and properties file. I also tried adding the classpath directly in workflow.xml but that doesn't help either.

<workflow-app xmlns="uri:oozie:workflow:0.1" name="sample" >
<start to="spark-stream" />
<action name="spark-stream">
<spark xmlns="uri:oozie:spark-action:0.1">
<spark-opts>--executor-cores 100 --executor-memory 12G --driver-memory 4G --conf spark.executor.extraClassPath=/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/hadoop/client/* spark.driver.extraClassPath=/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/hadoop/client/*</spark-opts>
<ok to="end"/>
<error to="mail_users"/>
<action name="mail_users">
<email xmlns="uri:oozie:email-action:0.1">
<subject>Job failed</subject>
<body>Please check</body>
<ok to="end"/>
<error to="end"/>
<end name="end" />

Properties file:



The job jar which is in the lib folder has spark version of 1.5.0 of CDH version 5.5.0. Could someone suggest a workaround or solution for this.


Re: Problem running spark with oozie

Cloudera Employee

Does the code work when submitted through spark submit?


It's most probable that you have some dependency pulling in an older version of hadoop/yarn libraries. Look for hadoop or yarn jar files in you package. Also, "" file should contain the version information. The  CDH 5.5 is based off the Hadoop/Yarn 2.6.0 and ideally you'll be using Cloudera provided dependency package. The version in that case should be "2.6.0-cdh5.5". More information on dependency jars that Cloudera provides can be found here:


Don't have an account?
Coming from Hortonworks? Activate your account here