Support Questions

Find answers, ask questions, and share your expertise
Announcements
Now Live: Explore expert insights and technical deep dives on the new Cloudera Community BlogsRead the Announcement

How Nifi handles huge 1 TB files ?

avatar
Explorer

Hi

I wanted to look into a need to transfer 1 TB files by using chunking in Nifi.

Each file also has to have its 20 items of meta data associated with it remain intact so the metadata and the data both survive the breaking up of the file ( chunking ) into 1000 chunks and re-assembling the file at the destination ( de-chunking ). Also, is the meta data for the large file duplicated onto each of the file 1000 chunks or is it a sub-set of the meta data?

Someone mentioned nifi passes the file chunks data through JVM memory on its way to the content repository.

Can I confirm whether file chunks pass through JVM memory as they are written to the file/content repository for a large file ( or any file for that matter ?) - I was fairly sure they aren't, otherwise the JVM size ( limited by machine RAM ) on the machine would limit reading in of large file data, and that would limit large file transfer speed - is that correct?

I'm trying to confirm my understanding of how Nifi handles these large files please.

Any help appreciated.

 

1 REPLY 1

avatar
Expert Contributor

Hello @zzzz77

Glad to have you on the community. 

What you are asking should be done with this kind of flow: 
GetFile → SplitContent → Transfer → MergeContent → PutFile

The SplitContent will split the file and the attributes will be get duplicated, because they are saved on the FlowFile, not on the content. 
More attributes will be added for the fragmentation part. 

The MergeContent will rebuild the content and the original attributes properly. 
So the metadata will not be lost. 


Regards,
Andrés Fallas
--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs-up button.