We are planning to use Flume to transfer files 2-5 GB (in different directories) on a weekly basis and we want to make sure we will be notified (preferably by email) if any of our Flume jobs fail.
Can we use Ooize (Workflow or coordinator in Hue)? if not, any other Hadoop tool available to provide the above functionality (job scheduling and error notification)?
Any help/ link much appreciated.
Thanks much in advance and please let me know if you need more info.
- For job scheduling you can use oozie. However, this is after you have integrated your data into HDFS. (for pig, hive, java jobs).
- In order to receive emails from your platform (if anything goes wrong) you have to configure the SNTP ==> http://www.cloudera.com/content/cloudera/en/documentation/archives/cloudera-manager-4/v4-5-4/Clouder...
- Flume: flume is not working on a schedule. Flume is treating the data when it receives it.
Thanks for detailed explaination about Flume. Can you please tell me how to keep flume running for all the time.
What you'll usually find is that a Flume Agent, not to be confused with Flume itself, will be setup and execute through Cloudera Manager and run as a service there