Created 10-06-2016 06:18 PM
For example, if NiFi delivers data to HDFS somehow, and I have a sequence of Hive and/or Spark jobs that need to run against that data in an HDP cluster, is it a good idea to orchestrate those successive Hive/Spark jobs using the NiFi executeProcess/excecuteScript processors? In contrast to writing Oozie workflows and Falcon processes.
Created 10-06-2016 06:49 PM
Short answer - kinda, it depends on your expectations of a scheduler. NiFi is perfectly capable of kicking off jobs once it prepares and lands the data. The nature of a scheduler, though, is to often wait for a job to finish, retry, act on it, etc. Depending on the actual infrastructure, you may find NiFi less convenient to handle such hierarchical dependencies than a scheduler that was designed for this purpose.
Created 10-06-2016 06:49 PM
Short answer - kinda, it depends on your expectations of a scheduler. NiFi is perfectly capable of kicking off jobs once it prepares and lands the data. The nature of a scheduler, though, is to often wait for a job to finish, retry, act on it, etc. Depending on the actual infrastructure, you may find NiFi less convenient to handle such hierarchical dependencies than a scheduler that was designed for this purpose.
Created 10-06-2016 06:53 PM
Use Falcon for Map Reduce, Sqoop and Flume jobs.
Use NiFi for everything else.