Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Block size in PutHDFS in Nifi preventing HDFS files being written

avatar
Expert Contributor

We have a Nifi flow which uses PutHDFS.

We want to have it put files into HDFS of a specific size each time (similar to how Flume does it with hdfs.rollInterval). I thought maybe it was Block size attribute but that seems to be breaking my file writing completely.

When I set Block Size to any value (I have tried 10kb and the syntax: 10 KB as well as well as a very small size like 500b), it runs without errors but no files show up in HDFS. If I remove the value from the Block Size attribute, it will put files in HDFS that are correct except I want to specify their size.

Any insight is appreciated.

** As an update to this, I realized I was setting the Block size lower than the minimum block size required so it wasn't writing to HDFS. That being said, when I changed it to the minimum of 1 GB, it seems to still write files to HDFS of a size less than 1 KB so maybe Im not understanding how Block Size works? How does one specify the roll over size for files being written to HDFS? **

1 ACCEPTED SOLUTION

avatar
Master Guru

You should use a MergeContent processor before PutHDFS to merge flow files together based on a minimum size.

View solution in original post

4 REPLIES 4

avatar
Expert Contributor

I take that back. I am receiving this error on the PutHDFS processor:

Specified block size is less than configured minimum value.

This seems to persist even if I change the Block Size to 200kb which is larger than the size of the files that it writes if I put nothing into the Block size attribute.

avatar
Expert Contributor

Its clear I misunderstand what Block Size is because when I set it to 1 GB it is not below the minimum and it generates HDFS files - but they are all less than a kb. How the heck do you specify the file sizes for the HDFS writes?

avatar
Master Guru

You should use a MergeContent processor before PutHDFS to merge flow files together based on a minimum size.

avatar
Expert Contributor

Perfect. Thank you. Still learning how to rethink data ingestion from Flume to Nifi.