Created 09-22-2016 07:26 AM
Hi All
I am trying to get the tweets using nifi and store those into a local file.
I have used the following configurations to get the tweets using GetTwitter process
Twitter Endpoint :- Filter Endpoint given all twitter keys Languages :- en Terms to Filter On :- facebook,wipro,google
Trying to put the tweets into a file using PutFile
Configurations that i have used are,
Directory:- /root/tweets Conflict Resolution Strategy:- fail Create Missing Directories :-true Maximum File Count :- 1 Last Modified Time :- Permissions:-rw-r--r-- Owner:-root Group:- root
I am able to get the tweet but that is ONE per ONE JSON FILE.
If I increase the Maximum File Count to 20 then it creates 20 json files and each file contains only one tweet.
But I want it to be store all the tweets in single json file.
In this
they have mentioned to use MergeContent processor.
But I didnt get it how to use MergeContent exactly as I am completely new to NIFI.
Please sugget me how to use.
Do i have to use it after the Putfile or Before PutFile.
please help.
Mohan.V
Created 09-22-2016 07:56 AM
Yes, MergeContent would be a solution to your problem. You have to use it before the PutFile processor in order to merge multiple flow files (each containing one json) into one flow file (containing multiple json). You may want to have a look at the documentation here:
In particular you have a property specifying the number of JSON you want to be merged into one single file: 'Minimum Number of Entries'
As a side note, when you have a processor on your canvas, you can right click on it and go to 'Usage' to display the documentation of the processor.
Hope this helps.
Created 09-22-2016 07:56 AM
Yes, MergeContent would be a solution to your problem. You have to use it before the PutFile processor in order to merge multiple flow files (each containing one json) into one flow file (containing multiple json). You may want to have a look at the documentation here:
In particular you have a property specifying the number of JSON you want to be merged into one single file: 'Minimum Number of Entries'
As a side note, when you have a processor on your canvas, you can right click on it and go to 'Usage' to display the documentation of the processor.
Hope this helps.
Created 09-22-2016 10:54 AM
thanks for your valuable suggestion Pierre Villard.
Its done.
Created 12-23-2019 12:20 AM
Hello All,
I do face a similar situation. I have tried using MergeContent before the PutFile. It is ending up with the Error:file with the same name already exists.
My Data flow is:
ConsumeMQTT (Reads Source) -- MergeContent (MergeFile) -- PutFile (Writes File)
Note:
If i setup the "Conflict Resolution Strategy" in the PutFile to "replace", this flow is working as only the latest value gets stored in the file. I cannot append data in the same file using the data flow as above.
Kindly your inputs.
Created 12-23-2019 09:58 AM
The NiFi putFile processor does not support append. Append actions can cause all kinds of issue especially with a NiFi cluster where the target directory of the putFile is mounted to all the NiFi nodes.
You can't have multiple nodes trying to write/append to the same file at the same time.
My suggestion would be to use the UpdateAttribute processor before your putFile to modify the filename attribute. Perhaps prepend a uuid to the filename to ensure uniqueness across multiple files or NiFi nodes (if clustered).
${UUID()}-${filename}
Hope this helps you,
Matt
Created 12-23-2019 09:55 PM
@MattWho ,
Thank you for the details. I can understand your point that same file cannot be accessed across cluster nodes. But, I am using Nifi is single node (without cluster) and was thinking that this should work.
Yes, i do use "Update Attribute" with the below file name conventions. This generates separate flow file for every message. I am trying to have this as one single file per node.
${filename:replace(${filename},"FileName_"):append(${now():format("yyyy-MM-dd-HH-mm-ss")})}.Json
Thank you