Created on 11-27-2017 04:09 PM - edited 08-17-2019 09:13 PM
I need to store a lot of small files (files have different types) in HDFS so it can be possible to process those data with Spark. I chose Hadoop Sequence File type to store in HDFS. Nifi was chosen to merge, convert and put to HDFS. I found out how to load files, convert them to Sequence File, but I have stuck at merge stage. How I can merge several small Sequence Files to one bigger? MergeContent processor just merge content without handling Hadoop Sequence File structure. My Nifi project screenshot is in attachment.
Created 11-27-2017 04:38 PM
You can use MergeContent before the CreateHadoopSequenceFile. If you have several types and you want to store them in separated files use RouteOnAttribute before.
Created 11-27-2017 04:38 PM
You can use MergeContent before the CreateHadoopSequenceFile. If you have several types and you want to store them in separated files use RouteOnAttribute before.
Created 11-27-2017 05:26 PM
Sorry. I have read MergeContent documentation and realized my mistake. Thank you!
Created 11-27-2017 05:02 PM
Hello @Abdelkrim Hadjidj
There are a lot of types of files. For example, png, bmp, pdf and etc. And i think it is bad idea, for example, to merge two pdf files. I think that Sequence Files were developed to store small files effectively. It is strange that CreateHadoopSequenceFile processor does not have capabilities to accumulate small files to bigger file.