Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Nifi: merge files from local fs and put to hdfs

avatar
New Member

I need to store a lot of small files (files have different types) in HDFS so it can be possible to process those data with Spark. I chose Hadoop Sequence File type to store in HDFS. Nifi was chosen to merge, convert and put to HDFS. I found out how to load files, convert them to Sequence File, but I have stuck at merge stage. How I can merge several small Sequence Files to one bigger? MergeContent processor just merge content without handling Hadoop Sequence File structure. My Nifi project screenshot is in attachment.

43714-screenshot.png

1 ACCEPTED SOLUTION

avatar

Hi @Rustam Fatkullin

You can use MergeContent before the CreateHadoopSequenceFile. If you have several types and you want to store them in separated files use RouteOnAttribute before.

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.n...

View solution in original post

3 REPLIES 3

avatar

Hi @Rustam Fatkullin

You can use MergeContent before the CreateHadoopSequenceFile. If you have several types and you want to store them in separated files use RouteOnAttribute before.

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.n...

avatar
New Member

Sorry. I have read MergeContent documentation and realized my mistake. Thank you!

avatar
New Member

Hello @Abdelkrim Hadjidj

There are a lot of types of files. For example, png, bmp, pdf and etc. And i think it is bad idea, for example, to merge two pdf files. I think that Sequence Files were developed to store small files effectively. It is strange that CreateHadoopSequenceFile processor does not have capabilities to accumulate small files to bigger file.