Support Questions

Find answers, ask questions, and share your expertise

How to Know in which directory data stores after the process is completed in Nifi

Explorer

Hi All,

I created one process executed successfully GetFile operation , How to find in which directory the file is stored.

steps i followed

1.Opened web UI localhost:8080/nifi/

2.Getfile Processor

3.Linked Processor with log attribute with success relation

4.start process

5.File is moved from local file system to Nifi Directory

how to find that one in which folder it was stored.

1 ACCEPTED SOLUTION

Rising Star

Typically the GetFile processor it pulls it into NiFi so you can do some type of processing or routing. It doesn't really put the file anywhere in particular.

You should use something like the PutFile processor to move the file to a location of your choosing. Just make sure to route the success relationship to the PutFile processor and configure the PutFile processing to your liking.

GetFile

PutFile

View solution in original post

11 REPLIES 11

Rising Star

Typically the GetFile processor it pulls it into NiFi so you can do some type of processing or routing. It doesn't really put the file anywhere in particular.

You should use something like the PutFile processor to move the file to a location of your choosing. Just make sure to route the success relationship to the PutFile processor and configure the PutFile processing to your liking.

GetFile

PutFile

Explorer

Thank you Zblaco

Master Guru
@AnjiReddy Anumolu

Just to add a little more detail to the above response from @zblanco.

When NiFi ingest data, that data is turned in to NiFi FlowFiles. A NiFi FlowFile consists of Attributes (Metadata) about the actual data and the physical data. The FlowFile metadata is stored in the FlowFile repository as well as JVM heap memory for faster performance. The FlowFile Attributes includes things like filename, ingest time, lineage age, filesize, what connection the FlowFile currently resides in dataflow, any user defined metadata, or processor added metadata, etc....). The physical bytes that make up the actual data content is written to claims within the NiFi content repository. A claim can contain the bytes for 1 to many ingest data files. For more info on the content repository and how claims work, see the following link:

https://community.hortonworks.com/articles/82308/understanding-how-nifis-content-repository-archivi....

Thanks,

Matt

Normally The Content Repository holds the content for all the FlowFiles in the system. By default, it is installed in the same root installation directory as all the other repositories; as a admin you can configure it on a separate drive if available. e.g. check your {nifi_install_dir}/content_repository for contents.

Explorer

Thank you @milind pandit

Hi @AnjiReddy Anumolu,

Easy way to get hold of your file is from provenance:

- on NiFi UI, click provenance button on top right corner

5311-screen-shot-2016-06-28-at-72103-am.png

- find the event for your file, click on "view details" button

5312-screen-shot-2016-06-28-at-72333-am.png

- you can view or download the file on the "contents" tab:

5313-screen-shot-2016-06-28-at-72527-am.png

if you need to see the file contents on your server, search in the content_repository for file named as "identifier" from output claim [ie 1467063966583-11 as in screenshot above] @ "offset" [ie 463775 as in screenshot above] .

5314-screen-shot-2016-06-28-at-74030-am.png

Hope this helps!

Thanks!

Explorer

Thank you @Jobin George

Both of the above answers are correct. Just to provide a full picture of what is happening...

GetFile picks up the file from directory and brings it into NiFI's content_repository, which as milind pointed out is by default located under {nifi_install_dir}/content_repository. This directory is not meant to be used by the user, it is for NiFi's internal purposes.

The FlowFile is then transferred to LogAttribute which logs information, and I assume if that is the end of your flow then you must have marked the success relationship on LogAttribute as auto-terminated. At this point the flow file is removed from NiFi and the content in the content repository will eventually be removed.

NiFi is not meant to be a storage system where you bring data in and then leave it there, your flow would have to send the data somewhere after GetFile.

Explorer

Thank you @Bryan Bende

thank you @Bryan Bende

Master Guru

@AnjiReddy Anumolu

Let me start off by making sure I fully understand the dataflow you have created to better answer your question. You have added a getFile processor to your flow which will pickup file(s) from a local file system directory and then sends them via the success relationship to a logAttribute processor.

What did you do with the logAttributes's success relationship?

If it is auto-terminated, you are essentially telling NiFi you are done with the files following a successful logging of the file(s) FlowFile attributes/metadata. If the success relationship has not been defined the processor will remain invalid and cannot be run. In this case the file(s) picked up by the getFile processor will remain queued on the connection between the getFile processor and the logAttribute processor.

In either case, when NiFi ingests file(s) they are placed in the NiFi content repository. The location of the content repository is defined/configured in the nifi.properties file. The default places them in a directory created within the default NiFi installation directory:

nifi.content.repository.directory.default=./content_repository 

NiFi stores file(s) in what are known as claims to make most efficient use of the system's hard disks. A claim can contain 1 to many files. The default claim configuration is also defined/configured in the nifi.properties file. The default configuration is as follows:

nifi.content.claim.max.appendable.size=10 MB 
nifi.content.claim.max.flow.files=100

For files smaller then 10 MB they may be stored with other files with up to 100 total files in a single claim. If a file is larger then 10 MB it will end up in a claim of one. At the same time files are written to a claim, FlowFile attributes/metadata is written about the ingested files in the flowfile repository. The location of the flowfile repository is also defined/configured in the nifi.properties file:

nifi.flowfile.repository.directory=./flowfile_repository 

These FlowFile attributes/metadata will contain information such as filename, filesize, location of claim in content repository, claim offset, etc... The claim offset is the starting byte location of a particular file's content within a claim. The fileSize defines the number of bytes from that offset that makes up the compete data.

The nifi-app.log contains fairly robust logging by default (configured in logback.xml file). When NiFi ingest files, NiFi will log that and that log line will contain information about the claim (location and offset). When NiFi auto-terminates FlowFiles they are removed from the content repository. Depending on the content repository archive setup, the file(s) may be archived for a period of time. In the case of archived file(s), it can be replayed using the provenance NiFi UI.

Thanks,

Matt