Member since
01-09-2025
1
Post
0
Kudos Received
0
Solutions
01-09-2025
02:36 AM
Hello, I am trying to implement file processing in batched manner using NiFi processors. My use case is, there are 70-80K files coming daily having size of 200-300 MB each. Taking input those files from S3 store and sending those to spark execution by Livy processor. Plan is to not sending each file location to spark, instead we can batch few files and sending to spark by Livy, so livy connections to spark will get reduced. Below are consideration while creating batch. 1. Batch Size: Batch will be based on size e.g. 1000MB 2. Wait duration: If there are not enough file to complete batch size then, batch will start after specific wait duration I am trying to implement this using wait, notity and updateAttribute(using stateful variables) based on batch size and wait time, but its not working fully. Any leads/suggestions how to implement this would be much appreciable. Thanks.
... View more
Labels:
- Labels:
-
Apache NiFi