Support Questions

Find answers, ask questions, and share your expertise

Append multiple tweets into single file using NIFI PutFile

avatar
Expert Contributor

Hi All

I am trying to get the tweets using nifi and store those into a local file.

I have used the following configurations to get the tweets using GetTwitter process

Twitter Endpoint :- Filter Endpoint
given all twitter keys
Languages :- en
Terms to Filter On :- facebook,wipro,google

Trying to put the tweets into a file using PutFile

Configurations that i have used are,

Directory:- /root/tweets
Conflict Resolution Strategy:- fail
Create Missing Directories :-true
Maximum File Count :- 1
Last Modified Time :-
Permissions:-rw-r--r--
Owner:-root
Group:- root

I am able to get the tweet but that is ONE per ONE JSON FILE.

If I increase the Maximum File Count to 20 then it creates 20 json files and each file contains only one tweet.

But I want it to be store all the tweets in single json file.

In this

https://community.hortonworks.com/questions/42149/using-nifi-to-collect-tweets-into-one-large-file.h...

they have mentioned to use MergeContent processor.

But I didnt get it how to use MergeContent exactly as I am completely new to NIFI.

Please sugget me how to use.

Do i have to use it after the Putfile or Before PutFile.

please help.

Mohan.V

1 ACCEPTED SOLUTION

avatar

@Mohan V

Yes, MergeContent would be a solution to your problem. You have to use it before the PutFile processor in order to merge multiple flow files (each containing one json) into one flow file (containing multiple json). You may want to have a look at the documentation here:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.MergeContent/i...

In particular you have a property specifying the number of JSON you want to be merged into one single file: 'Minimum Number of Entries'

As a side note, when you have a processor on your canvas, you can right click on it and go to 'Usage' to display the documentation of the processor.

Hope this helps.

View solution in original post

5 REPLIES 5

avatar

@Mohan V

Yes, MergeContent would be a solution to your problem. You have to use it before the PutFile processor in order to merge multiple flow files (each containing one json) into one flow file (containing multiple json). You may want to have a look at the documentation here:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.MergeContent/i...

In particular you have a property specifying the number of JSON you want to be merged into one single file: 'Minimum Number of Entries'

As a side note, when you have a processor on your canvas, you can right click on it and go to 'Usage' to display the documentation of the processor.

Hope this helps.

avatar
Expert Contributor

thanks for your valuable suggestion Pierre Villard.

Its done.

avatar
Explorer

Hello All,

 

I do face a similar situation. I have tried using MergeContent before the PutFile. It is ending up with the Error:file with the same name already exists.

 

My Data flow is:

ConsumeMQTT (Reads Source) -- MergeContent (MergeFile) -- PutFile (Writes File)

 

Note:

If i setup the "Conflict Resolution Strategy" in the PutFile to "replace", this flow is working as only the latest value gets stored in the file. I cannot append data in the same file using the data flow as above. 

 

Kindly your inputs.

 

avatar
Master Mentor

@Siraj 

 

The NiFi putFile processor does not support append.  Append actions can cause all kinds of issue especially with a NiFi cluster where the target directory of the putFile is mounted to all the NiFi nodes.
You can't have multiple nodes trying to write/append to the same file at the same time.

My suggestion would be to use the UpdateAttribute processor before your putFile to modify the filename attribute.   Perhaps prepend a uuid to the filename to ensure uniqueness across multiple files or NiFi nodes (if clustered).

${UUID()}-${filename}

 

Hope this helps you,

Matt

avatar
Explorer

@MattWho ,

 

Thank you for the details. I can understand your point that same file cannot be accessed across cluster nodes. But, I am using Nifi is single node (without cluster) and was thinking that this should work.

 

Yes, i do use "Update Attribute" with the below file name conventions. This generates separate flow file for every message. I am trying to have this as one single file per node.

 

${filename:replace(${filename},"FileName_"):append(${now():format("yyyy-MM-dd-HH-mm-ss")})}.Json

 

Thank you