Support Questions

Find answers, ask questions, and share your expertise

Is it a good idea (generally) to use NiFi as a scheduler for HDP processes?

avatar

For example, if NiFi delivers data to HDFS somehow, and I have a sequence of Hive and/or Spark jobs that need to run against that data in an HDP cluster, is it a good idea to orchestrate those successive Hive/Spark jobs using the NiFi executeProcess/excecuteScript processors? In contrast to writing Oozie workflows and Falcon processes.

1 ACCEPTED SOLUTION

avatar

Short answer - kinda, it depends on your expectations of a scheduler. NiFi is perfectly capable of kicking off jobs once it prepares and lands the data. The nature of a scheduler, though, is to often wait for a job to finish, retry, act on it, etc. Depending on the actual infrastructure, you may find NiFi less convenient to handle such hierarchical dependencies than a scheduler that was designed for this purpose.

View solution in original post

2 REPLIES 2

avatar

Short answer - kinda, it depends on your expectations of a scheduler. NiFi is perfectly capable of kicking off jobs once it prepares and lands the data. The nature of a scheduler, though, is to often wait for a job to finish, retry, act on it, etc. Depending on the actual infrastructure, you may find NiFi less convenient to handle such hierarchical dependencies than a scheduler that was designed for this purpose.

avatar
Master Guru

Use Falcon for Map Reduce, Sqoop and Flume jobs.

Use NiFi for everything else.