Created 11-30-2016 11:05 AM
Hi all,
Is possible to create an workflow on Oozie that automatically execute some Hive, Pig and Spark scripts in order to automate my analytics process?
Many thanks!
Created 11-30-2016 07:37 PM
Yes! You can use Hive/Pig/Spark actions in appropriate order as per your requirement and control the flow ( like if Hive is successful then more to Pig node and so on or else go to fail node. )
OR
You can create a shell script and put calls to your Hive/Pig/Spark scripts in an appropriate order and use Oozie's shell action to execute the script.
If your cluster is Kerberized then I would not suggest to use shell action as it will create lot of issues related to authentication.
Hope this information helps! Happy Hadooping! 🙂
Created 11-30-2016 07:37 PM
Yes! You can use Hive/Pig/Spark actions in appropriate order as per your requirement and control the flow ( like if Hive is successful then more to Pig node and so on or else go to fail node. )
OR
You can create a shell script and put calls to your Hive/Pig/Spark scripts in an appropriate order and use Oozie's shell action to execute the script.
If your cluster is Kerberized then I would not suggest to use shell action as it will create lot of issues related to authentication.
Hope this information helps! Happy Hadooping! 🙂
Created 12-01-2016 08:54 PM
You might want to look at Workflow Designer too, which is Technical Preview in HDP 2.5. You can work with it in the sandbox (http://hortonworks.com/downloads/#sandbox) and get an idea of how you can create Oozie workflows with Pig, Hive, and Spark actions.