Support Questions

Find answers, ask questions, and share your expertise

Nifi: merge files from local fs and put to hdfs

avatar
Explorer

I need to store a lot of small files (files have different types) in HDFS so it can be possible to process those data with Spark. I chose Hadoop Sequence File type to store in HDFS. Nifi was chosen to merge, convert and put to HDFS. I found out how to load files, convert them to Sequence File, but I have stuck at merge stage. How I can merge several small Sequence Files to one bigger? MergeContent processor just merge content without handling Hadoop Sequence File structure. My Nifi project screenshot is in attachment.

43714-screenshot.png

1 ACCEPTED SOLUTION

avatar

Hi @Rustam Fatkullin

You can use MergeContent before the CreateHadoopSequenceFile. If you have several types and you want to store them in separated files use RouteOnAttribute before.

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.n...

View solution in original post

3 REPLIES 3

avatar

Hi @Rustam Fatkullin

You can use MergeContent before the CreateHadoopSequenceFile. If you have several types and you want to store them in separated files use RouteOnAttribute before.

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.n...

avatar
Explorer

Sorry. I have read MergeContent documentation and realized my mistake. Thank you!

avatar
Explorer

Hello @Abdelkrim Hadjidj

There are a lot of types of files. For example, png, bmp, pdf and etc. And i think it is bad idea, for example, to merge two pdf files. I think that Sequence Files were developed to store small files effectively. It is strange that CreateHadoopSequenceFile processor does not have capabilities to accumulate small files to bigger file.