Created 06-15-2019 08:04 AM
My nifi flow is not progressing nor showing any error:
Flow is for :fetch a test.json fiel from s3 and populate its content in a table in postgresql db.
lists3object-> fetchs3object->JoltTrasformationJSON -> ........ ->PutSQL.
The lists3object alone lists key as my test.json in "view status"
I'm not able to paste screenshots here.
Though all processors got started, none of them are doing any transactions (in & out ) both are 0 bytes and i dont see any errors also.
Last message logged in nifi-app.log is:
2019-06-14 07:26:48,864 INFO [NiFi Web Server-92] o.a.n.controller.StandardProcessorNode Stopping processor: class org.apache.nifi.processors.aws.s3.ListS3
2019-06-14 07:26:48,864 INFO [Timer-Driven Process Thread-4] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling ListS3[id=016b1032-cd19-10b1-6d77-5a63389cd903] to run
2019-06-14 07:26:49,166 INFO [Flow Service Tasks Thread-1] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5aaa6545 // Another save pending = false
2019-06-14 07:26:51,166 INFO [NiFi Web Server-20] o.a.n.c.s.StandardProcessScheduler Stopping FetchS3Object[id=016b102b-cd19-10b1-385a-bb202d1cebc4]
2019-06-14 07:26:51,166 INFO [NiFi Web Server-20] o.a.n.controller.StandardProcessorNode Stopping processor: class org.apache.nifi.processors.aws.s3.FetchS3Object
2019-06-14 07:26:51,179 INFO [Timer-Driven Process Thread-4] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling FetchS3Object[id=016b102b-cd19-10b1-385a-bb202d1cebc4] to run
2019-06-14 07:26:51,185 INFO [Flow Service Tasks Thread-1] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5aaa6545 // Another save pending = false
2019-06-14 07:26:56,833 INFO [NiFi Web Server-92] o.a.n.c.s.StandardProcessScheduler Starting ListS3[id=016b1032-cd19-10b1-6d77-5a63389cd903]
2019-06-14 07:26:56,836 INFO [Timer-Driven Process Thread-8] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled ListS3[id=016b1032-cd19-10b1-6d77-5a63389cd903] to run with 1 threads
2019-06-14 07:26:57,202 INFO [Flow Service Tasks Thread-1] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5aaa6545 // Another save pending = false
The flow was working partially earlier with some error message at Splitjson processor. But now i'm not able to proceed as i don understand where it is stuck. I tried restarting nifi, but no luck.
Can you please suggest where could be the issue.
Created 06-15-2019 02:36 PM
ListS3 processor is stateful processor once the processor runs it will store the state in the processor and then runs incrementally,if we don't have any new files added to S3 directory then processor won't list any files.
How to Clear state:
Stop the ListS3 processor and Right click on ListS3processor and select state and clear the state that is saved in the processor.
Then start ListS3 processor, now processor will list all the files in S3 directory.
Created 06-15-2019 02:36 PM
ListS3 processor is stateful processor once the processor runs it will store the state in the processor and then runs incrementally,if we don't have any new files added to S3 directory then processor won't list any files.
How to Clear state:
Stop the ListS3 processor and Right click on ListS3processor and select state and clear the state that is saved in the processor.
Then start ListS3 processor, now processor will list all the files in S3 directory.
Created 06-19-2019 01:52 AM
Use RouteOnArttribute processor after ListS3Object processor and filter only the required file and pass that to FetchS3Object.
Flow:
Lists3 RouteOnAttribute FetchS3
(or)
If you want to pull the same file from s3 all the time, then you can use flow as:
GenerateFlowFile //schedule this processor as per your requirements FetchS3Object //configure full s3 file path
Created 06-18-2019 07:48 AM
Hi flow seems to progress now after clearing the state.
I actually want to pull a specific file json file from S3 and push its content to a DB.
ListS3Object and FetchS3Object are fetching all file of the latest time stamp and later that.
Though in fetchS3object i specify particular "Object Key" with my specific file name, all the files get fetched in the queue.
Can you please suggest me a way to pull a specific json file.