Support Questions

Find answers, ask questions, and share your expertise

Convert Zeppelin notebook to python script and oozie spark-submit

avatar
Explorer

Hi,

I'm planning to automate some ETL jobs on tables that I have in Hive using pyspark. I've been using Zeppelin with pyspark interpreter (%pyspark) to develop my code, and want to use oozie to automate it.

As far as I know, oozie can only automate python scripts (.py files) and not Zeppelin notebooks, is there any way I can convert my existing Zeppelin notebooks into python scripts?

Also, I'm not sure if there is a way to use oozie to spark-submit a python script, to take advantage of Spark & Yarn for parellel processing.

Thanks!

2 REPLIES 2

avatar
Cloudera Employee

Hi @John Tan,

The easiest way to export a single Zeppelin notebook to a script file is to simply copy paste the content of each cell to the file (and structure it depending of the language ofc). You can also write your own parser to deal with the .json notebook, but it's probably a bigger effort than just copy pasting for a single/few notebooks.

You can schedule a spark-submit using Oozie. You can also use Zeppelin cron in order to schedule notebooks directly from Zeppelin interface (see here)

Regards,

Damien.

avatar
New Contributor

Try this python script -

https://github.com/sat28/zeppelin_notebook_to_script

It converts any python notebook to a script with the desired extension.