Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Is it a good idea (generally) to use NiFi as a scheduler for HDP processes?

avatar
New Member

For example, if NiFi delivers data to HDFS somehow, and I have a sequence of Hive and/or Spark jobs that need to run against that data in an HDP cluster, is it a good idea to orchestrate those successive Hive/Spark jobs using the NiFi executeProcess/excecuteScript processors? In contrast to writing Oozie workflows and Falcon processes.

1 ACCEPTED SOLUTION

avatar

Short answer - kinda, it depends on your expectations of a scheduler. NiFi is perfectly capable of kicking off jobs once it prepares and lands the data. The nature of a scheduler, though, is to often wait for a job to finish, retry, act on it, etc. Depending on the actual infrastructure, you may find NiFi less convenient to handle such hierarchical dependencies than a scheduler that was designed for this purpose.

View solution in original post

2 REPLIES 2

avatar

Short answer - kinda, it depends on your expectations of a scheduler. NiFi is perfectly capable of kicking off jobs once it prepares and lands the data. The nature of a scheduler, though, is to often wait for a job to finish, retry, act on it, etc. Depending on the actual infrastructure, you may find NiFi less convenient to handle such hierarchical dependencies than a scheduler that was designed for this purpose.

avatar
Master Guru

Use Falcon for Map Reduce, Sqoop and Flume jobs.

Use NiFi for everything else.