Support Questions

Find answers, ask questions, and share your expertise

No progress in nifi flow

avatar
Explorer

My nifi flow is not progressing nor showing any error:

Flow is for :fetch a test.json fiel from s3 and populate its content in a table in postgresql db.

lists3object-> fetchs3object->JoltTrasformationJSON -> ........ ->PutSQL.

The lists3object alone lists key as my test.json in "view status"


I'm not able to paste screenshots here.

Though all processors got started, none of them are doing any transactions (in & out ) both are 0 bytes and i dont see any errors also.


Last message logged in nifi-app.log is:

2019-06-14 07:26:48,864 INFO [NiFi Web Server-92] o.a.n.controller.StandardProcessorNode Stopping processor: class org.apache.nifi.processors.aws.s3.ListS3

2019-06-14 07:26:48,864 INFO [Timer-Driven Process Thread-4] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling ListS3[id=016b1032-cd19-10b1-6d77-5a63389cd903] to run

2019-06-14 07:26:49,166 INFO [Flow Service Tasks Thread-1] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5aaa6545 // Another save pending = false

2019-06-14 07:26:51,166 INFO [NiFi Web Server-20] o.a.n.c.s.StandardProcessScheduler Stopping FetchS3Object[id=016b102b-cd19-10b1-385a-bb202d1cebc4]

2019-06-14 07:26:51,166 INFO [NiFi Web Server-20] o.a.n.controller.StandardProcessorNode Stopping processor: class org.apache.nifi.processors.aws.s3.FetchS3Object

2019-06-14 07:26:51,179 INFO [Timer-Driven Process Thread-4] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling FetchS3Object[id=016b102b-cd19-10b1-385a-bb202d1cebc4] to run

2019-06-14 07:26:51,185 INFO [Flow Service Tasks Thread-1] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5aaa6545 // Another save pending = false

2019-06-14 07:26:56,833 INFO [NiFi Web Server-92] o.a.n.c.s.StandardProcessScheduler Starting ListS3[id=016b1032-cd19-10b1-6d77-5a63389cd903]

2019-06-14 07:26:56,836 INFO [Timer-Driven Process Thread-8] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled ListS3[id=016b1032-cd19-10b1-6d77-5a63389cd903] to run with 1 threads

2019-06-14 07:26:57,202 INFO [Flow Service Tasks Thread-1] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5aaa6545 // Another save pending = false


The flow was working partially earlier with some error message at Splitjson processor. But now i'm not able to proceed as i don understand where it is stuck. I tried restarting nifi, but no luck.

Can you please suggest where could be the issue.

1 ACCEPTED SOLUTION

avatar
Master Guru

@Jayashree S

ListS3 processor is stateful processor once the processor runs it will store the state in the processor and then runs incrementally,if we don't have any new files added to S3 directory then processor won't list any files.

How to Clear state:

Stop the ListS3 processor and Right click on ListS3processor and select state and clear the state that is saved in the processor.

Then start ListS3 processor, now processor will list all the files in S3 directory.

View solution in original post

3 REPLIES 3

avatar
Master Guru

@Jayashree S

ListS3 processor is stateful processor once the processor runs it will store the state in the processor and then runs incrementally,if we don't have any new files added to S3 directory then processor won't list any files.

How to Clear state:

Stop the ListS3 processor and Right click on ListS3processor and select state and clear the state that is saved in the processor.

Then start ListS3 processor, now processor will list all the files in S3 directory.

avatar
Master Guru

@Jayashree S

Use RouteOnArttribute processor after ListS3Object processor and filter only the required file and pass that to FetchS3Object.

Flow:

Lists3
RouteOnAttribute
FetchS3

(or)

If you want to pull the same file from s3 all the time, then you can use flow as:

GenerateFlowFile //schedule this processor as per your requirements
FetchS3Object //configure full s3 file path


avatar
Explorer

Hi flow seems to progress now after clearing the state.

I actually want to pull a specific file json file from S3 and push its content to a DB.

ListS3Object and FetchS3Object are fetching all file of the latest time stamp and later that.

Though in fetchS3object i specify particular "Object Key" with my specific file name, all the files get fetched in the queue.

Can you please suggest me a way to pull a specific json file.