Member since
08-24-2018
6
Posts
0
Kudos Received
0
Solutions
10-02-2018
01:14 PM
Hi, I'm planning to automate some ETL jobs on tables that I have in Hive using pyspark. I've been using Zeppelin with pyspark interpreter (%pyspark) to develop my code, and want to use oozie to automate it. As far as I know, oozie can only automate python scripts (.py files) and not Zeppelin notebooks, is there any way I can convert my existing Zeppelin notebooks into python scripts? Also, I'm not sure if there is a way to use oozie to spark-submit a python script, to take advantage of Spark & Yarn for parellel processing. Thanks!
... View more
Labels:
- Labels:
-
Apache Oozie
-
Apache Spark
-
Apache Zeppelin
08-27-2018
01:39 AM
I've begun exploring Druid since hearing about the Druid + Hive integration. From what I can see, Druid tables offer real-time querying and way quicker pre-aggregation. With that in mind, I'm curious asa to when Hive tables would be used over Druid tables? Maybe when you have to calculate statistics ( average, SD ) ?
... View more
Labels:
- Labels:
-
Apache Hive