Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How can we schedule nifi data flow ?

avatar
Contributor

I want to extract twitter data on daily basis for some time. Is there any way to do so using Nifi?

1 ACCEPTED SOLUTION

avatar

Hi @Sushant Bharti,

1) Simple way to schedule would be to use scheduler in the processor to run with cron/timer like below:

6938-screen-shot-2016-08-24-at-110038-am.png

In this case, the flow will be running, but the processor will only pull data at the time you specify, you can chose from options there.

2) The other way around this is to use nifi-api to start/stop the processors from outside: you can find details about this in the below question:

https://community.hortonworks.com/questions/48445/executing-apache-nifi-through-a-shell-script.html#...

Thanks,

Jobin George

View solution in original post

6 REPLIES 6

avatar
Master Guru

@Sushant Bharti you have options:

The first configuration option is the Scheduling Strategy. There are three possible options for scheduling components:

  • Timer driven: This is the default mode. The Processor will be scheduled to run on a regular interval. The interval at which the Processor is run is defined by the ‘Run schedule’ option (see below).
  • Event driven: When this mode is selected, the Processor will be triggered to run by an event, and that event occurs when FlowFiles enter Connections feeding this Processor. This mode is currently considered experimental and is not supported by all Processors. When this mode is selected, the ‘Run schedule’ option is not configurable, as the Processor is not triggered to run periodically but as the result of an event. Additionally, this is the only mode for which the ‘Concurrent tasks’ option can be set to 0. In this case, the number of threads is limited only by the size of the Event-Driven Thread Pool that the administrator has configured.
  • CRON driven: When using the CRON driven scheduling mode, the Processor is scheduled to run periodically, similar to the Timer driven scheduling mode. However, the CRON driven mode provides significantly more flexibility at the expense of increasing the complexity of the configuration. This value is made up of six fields, each separated by a space.

more info here

avatar
Contributor

Please advice if the Event driven will be now available/implemented for NiFi 1.x and later versions?

avatar
Super Mentor

Anything you can do via the browser can be done my making calls to the NiFi-API.

You could either setup an external process to run a couple curl commands to start and they stop the GetTwitter processor in your flow or you could us a couple invokeHTTP processors in your dataflow (configured using the cron scheduling strategy) to start and stop the GetTwitter processor on a given schedule.

Matt

avatar
Rising Star

Hi Matt,

I am working on this "curl" stuff but getting an error.

I am using NiFi 1.0

Can you please see the below statements and let me know what I am missing?

This works fine: curl -i -X GET http://localhost:8080/nifi-api/processors/1b943f28-3803-15dd-aec4-d362e560fbaf/state

It gives a json as expected.

............................

This does not work: curl -i -X PUT -H 'Content-Type: application/json' -d '{"version":27,"clientId":"ddf4a732-0158-1000-419b-512493387a32"},"processors":{"id":"1b943f28-3803-15dd-aec4-362e560fbaf","state":"RUNNING"}' http://localhost:8080/nifi-api/processors/1b943f28-3803-15dd-aec4-d362e560fbaf/

This gives:

HTTP/1.1 400 Bad Request Date: Wed, 21 Dec 2016 05:50:43 GMT Content-Type: text/plain Transfer-Encoding: chunked Server: Jetty(9.3.9.v20160517) Message body is malformed. Unable to map into expected format.

avatar
Rising Star

avatar

Hi @Sushant Bharti,

1) Simple way to schedule would be to use scheduler in the processor to run with cron/timer like below:

6938-screen-shot-2016-08-24-at-110038-am.png

In this case, the flow will be running, but the processor will only pull data at the time you specify, you can chose from options there.

2) The other way around this is to use nifi-api to start/stop the processors from outside: you can find details about this in the below question:

https://community.hortonworks.com/questions/48445/executing-apache-nifi-through-a-shell-script.html#...

Thanks,

Jobin George