- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Which tool to schedule Hive queries?
- Labels:
-
Apache Hive
Created ‎03-22-2016 09:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I would like to create/update Hive tables let's say once per hour.
Which tool should i use to schedule the execution of my hql script?
Thank you in advance
Created ‎03-23-2016 04:51 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Lubin Lemarchand, for simple, single script, independent jobs you can also use cron. For jobs consisting of 2 or more scripts you can use Oozie which is the scheduling tool of choice in Hadoop ecosystem. Initially, Oozie didn't support Hive but now it does together with Pig, MR, Sqoop, Spark and other, so-called "actions" or steps. Long term, it's definitely worth learning more about Oozie: quick start.
Created ‎03-22-2016 11:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
why can't you use oozie time based workflow.
Created ‎03-22-2016 12:25 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well, most of the tutorials i found on Oozie were written 2 or 3 years ago so i was wondering if it was still the recommended tech for this kind of things.
Created ‎03-22-2016 12:34 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can also use falocn with hive action.
Created ‎03-23-2016 04:51 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Lubin Lemarchand, for simple, single script, independent jobs you can also use cron. For jobs consisting of 2 or more scripts you can use Oozie which is the scheduling tool of choice in Hadoop ecosystem. Initially, Oozie didn't support Hive but now it does together with Pig, MR, Sqoop, Spark and other, so-called "actions" or steps. Long term, it's definitely worth learning more about Oozie: quick start.
Created ‎03-23-2016 06:52 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
you also invested time in learning nifi, nifi has a cron based scheduler, would be interesting to see it leveraged for this use case. Look under scheduling tab https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#Configuring_a_Processor
Created ‎03-23-2016 08:44 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes i use it to request the API i'm interested in once a week (except for the twitter streaming api of course).
