Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Execute processor only once for multiple flowfiles

avatar
Explorer

I have a flow like this. ListHDFS -> FetchHDFS -> PutFile -> ExecuteStreamCommand.

There are 15 files that I would place in a folder and it will be copied to local FS and calls a Python script that will process all the 15 files(it's mandatory that all 15 files are processed at once as the data is merged and transformed) at once and produce a single file. As I understood, the above flow will get executed for every flowfile and hence the Python script also will run and produce multiple files. How do I make the ExecuteStreamCommand to run only once after all 15 files have been placed in the source folder so that I can get only one output file from the Pythons script.

2 REPLIES 2

avatar
Super Guru

avatar
New Contributor

I have flow files with different dimensions. But they have a common id column. I want to use that to join the flowfiles and pick specific columns. How can I use mergeContent in this case?