I'm planning to automate some ETL jobs on tables that I have in Hive using pyspark. I've been using Zeppelin with pyspark interpreter (%pyspark) to develop my code, and want to use oozie to automate it.
As far as I know, oozie can only automate python scripts (.py files) and not Zeppelin notebooks, is there any way I can convert my existing Zeppelin notebooks into python scripts?
Also, I'm not sure if there is a way to use oozie to spark-submit a python script, to take advantage of Spark & Yarn for parellel processing.
The easiest way to export a single Zeppelin notebook to a script file is to simply copy paste the content of each cell to the file (and structure it depending of the language ofc). You can also write your own parser to deal with the .json notebook, but it's probably a bigger effort than just copy pasting for a single/few notebooks.
You can schedule a spark-submit using Oozie. You can also use Zeppelin cron in order to schedule notebooks directly from Zeppelin interface (see here)