Support Questions

Find answers, ask questions, and share your expertise

How to run Spark job from Oozie Workflow on HDP/hue

avatar
Expert Contributor

I have created a small java program for Spark. It works with "spark-submit" command. I like to run it from Oozie workflow. It seems HDP 2.3 has a capability to run Spark job from Oozie workflow, but on Hue's GUI, I don't have a choice of Spark job to include into a workflow. How do I do?

1 ACCEPTED SOLUTION

avatar
Expert Contributor

I figured it out by myself. Here is the steps:

1: download sandbox or use your existing sandbox (HDP 2.3.2)

2: create a workflow on Hue's oozie

3: click "Edit Properties" and add a property in Oozie parameters: oozie.action.sharelib.for.spark = spark,hcatalog,hive

4: click Save button

5: add a shell action; fill name field. shell command field may be required; enter whatever any string temporary and save the shell action. We come back to edit it later.

6: Close workflow and open file browser; click oozie, then workspaces. Identify _hue_xxx directory for the workflow you are creating.

7: create lib directory.

8: copy your jar file that contains spark java program.

9: move up the directory and copy shell file (e.g. script.sh) that contains:

spark-submit --class JDBCTest spark-test-1.0.jar

spark-test-1.0.jar is the file you uploaded to lib directory.

10: Go back to workflow web page

11: Open the shell action and set Shell command by selecting shell file (e.g. script.sh)

12: Also populate Files field to add the schell file (e.g. script.sh) again

13: click Done

14: save the workflow

15: submit the workflow

16: it should run.

My java program does like this:

Statement stmt = con.createStatement();

String sql = "SELECT s07.description AS job_category, s07.salary , s08.salary , (s08.salary - s07.salary) AS salary_difference FROM sample_07 s07 JOIN sample_08 s08 ON ( s07.code = s08.code) WHERE s07.salary < s08.salary SORT BY s08.salary-s07.salary DESC LIMIT 5";

ResultSet res = stmt.executeQuery(sql);

It uses hive jdbc driver.

View solution in original post

13 REPLIES 13

avatar
Master Guru

avatar
Expert Contributor

I'm new with HDP/Big Data environment and understand what it is described, but I don't know how I should interpret it into HDP 2.3 environment. Also, I would like to run it from Hue's GUI Oozie workflow editor. Could you explain step by step?

Thanks a lot.

avatar

There is a bug which requires you to manually copy hive/hcat jars into spark shared lib dir in order to get this to work:

https://issues.apache.org/jira/browse/OOZIE-2277

avatar
Expert Contributor

It looks like HDP 2.3.2 already has this patch.

avatar
Master Mentor

@Ali Bajwa @Shigeru Takehara when you specify oozie.action.sharelib.for.spark = spark,hcatalog,hive

it will include those libraries with Spark. The trick I learned a hard way :).

avatar
Explorer

I looked that the SparkMain class contained within the oozie-sharelib-spark-4.2.0.2.3.4.1-10.jar that comes with the Spark 1.6 TP, and it does not appear to have the fix for https://issues.apache.org/jira/browse/OOZIE-2277

avatar
Master Mentor

@cmuchinsky Spark Oozie action is not supported in HDP at this moment. It is explicitly stated in our Spark User guide.

avatar
Explorer

Understood @Artem Ervits, however your previous comment seems to indicate you have some knowledge of the Oozie 'oozie.action.sharelib.for.spark' property, so I wanted to clear up that the comment by @Shigeru Takehara indicating OOZIE-227 was fixed doesn't seem to jibe with the HDP 2.3.4 or 2.3.4.1-TP deliverables.

While Spark via Oozie isn't officially supported, the Hortonworks Support team had provided us with a procedure to update the Oozie sharelib for Spark to get it working with 2.3.4, however that no longer seems to work with the Spark 1.6 enabled 2.3.4.1-TP version.

avatar
Master Mentor

@cmuchinsky I would love to see the steps engineering provided and in general, just because we don't officially support it, doesn't mean it cannot be done. It just means sometimes you have to dig deeper and with Oozie, I have limited patience :).