Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Oozie - Scheduling - Pig and Hive Scripts and Spark

avatar
Rising Star

Hi all,

Is possible to create an workflow on Oozie that automatically execute some Hive, Pig and Spark scripts in order to automate my analytics process?

Many thanks!

1 ACCEPTED SOLUTION

avatar
Master Guru
@João Souza

Yes! You can use Hive/Pig/Spark actions in appropriate order as per your requirement and control the flow ( like if Hive is successful then more to Pig node and so on or else go to fail node. )

OR

You can create a shell script and put calls to your Hive/Pig/Spark scripts in an appropriate order and use Oozie's shell action to execute the script.

If your cluster is Kerberized then I would not suggest to use shell action as it will create lot of issues related to authentication.

Hope this information helps! Happy Hadooping! 🙂

View solution in original post

2 REPLIES 2

avatar
Master Guru
@João Souza

Yes! You can use Hive/Pig/Spark actions in appropriate order as per your requirement and control the flow ( like if Hive is successful then more to Pig node and so on or else go to fail node. )

OR

You can create a shell script and put calls to your Hive/Pig/Spark scripts in an appropriate order and use Oozie's shell action to execute the script.

If your cluster is Kerberized then I would not suggest to use shell action as it will create lot of issues related to authentication.

Hope this information helps! Happy Hadooping! 🙂

avatar
Expert Contributor

You might want to look at Workflow Designer too, which is Technical Preview in HDP 2.5. You can work with it in the sandbox (http://hortonworks.com/downloads/#sandbox) and get an idea of how you can create Oozie workflows with Pig, Hive, and Spark actions.