About mngeowcy

mngeowcy · ‎10-02-2018

Hi, I'm planning to automate some ETL jobs on tables that I have in Hive using pyspark. I've been using Zeppelin with pyspark interpreter (%pyspark) to develop my code, and want to use oozie to automate it. As far as I know, oozie can only automate python scripts (.py files) and not Zeppelin notebooks, is there any way I can convert my existing Zeppelin notebooks into python scripts? Also, I'm not sure if there is a way to use oozie to spark-submit a python script, to take advantage of Spark & Yarn for parellel processing. Thanks!

mngeowcy · ‎08-28-2018

Thanks for the link! It helped to clear up most of my questions.

mngeowcy · ‎08-27-2018

I've begun exploring Druid since hearing about the Druid + Hive integration. From what I can see, Druid tables offer real-time querying and way quicker pre-aggregation. With that in mind, I'm curious asa to when Hive tables would be used over Druid tables? Maybe when you have to calculate statistics ( average, SD ) ?

Online	Offline
Last Visited	‎03-13-2019 02:32 AM

Member Since	‎08-24-2018 09:41 AM
Last Visited	‎03-13-2019 02:32 AM
Posts	6

Cloudera Community

Convert Zeppelin notebook to python script and ooz...

Re: When to use hive table over druid table

When to use hive table over druid table