Created 03-02-2017 07:44 PM
We have a Nifi flow which uses PutHDFS.
We want to have it put files into HDFS of a specific size each time (similar to how Flume does it with hdfs.rollInterval). I thought maybe it was Block size attribute but that seems to be breaking my file writing completely.
When I set Block Size to any value (I have tried 10kb and the syntax: 10 KB as well as well as a very small size like 500b), it runs without errors but no files show up in HDFS. If I remove the value from the Block Size attribute, it will put files in HDFS that are correct except I want to specify their size.
Any insight is appreciated.
** As an update to this, I realized I was setting the Block size lower than the minimum block size required so it wasn't writing to HDFS. That being said, when I changed it to the minimum of 1 GB, it seems to still write files to HDFS of a size less than 1 KB so maybe Im not understanding how Block Size works? How does one specify the roll over size for files being written to HDFS? **
Created 03-02-2017 08:06 PM
You should use a MergeContent processor before PutHDFS to merge flow files together based on a minimum size.
Created 03-02-2017 07:52 PM
I take that back. I am receiving this error on the PutHDFS processor:
Specified block size is less than configured minimum value.
This seems to persist even if I change the Block Size to 200kb which is larger than the size of the files that it writes if I put nothing into the Block size attribute.
Created 03-02-2017 07:56 PM
Its clear I misunderstand what Block Size is because when I set it to 1 GB it is not below the minimum and it generates HDFS files - but they are all less than a kb. How the heck do you specify the file sizes for the HDFS writes?
Created 03-02-2017 08:06 PM
You should use a MergeContent processor before PutHDFS to merge flow files together based on a minimum size.
Created 03-02-2017 08:11 PM
Perfect. Thank you. Still learning how to rethink data ingestion from Flume to Nifi.