I get Exception in createBlockOutputStream in Storm topology writing to HDFS using HDFS Bolt after several days of running. The partitioner set in HDFS bolt resolves to year=[year]/month=[month]/day=[day]/hour=[hour] (example: year=2020/month=06/day=26/hour=19). So the bolt is creating new files each hour. When I left the topology running for several days, I get the exception Exception in createBlockOutputStream. When the error appears I also can see that HDFS starts to get Under Replicated Blocks greater than 0 and increasing. I also found that when the topology is running the Number of Files Under Construction keeps increasing. Also the HDFS Total Load keeps increasing. They both drops to zero when I restart the topology. The file rotation policy configured is 120 MB. The sync policy is set to 500.000 tuples. The HDFS files are written in Avro format, so I am using AvroGenericRecordBolt class. I am using HDFS in HA mode. Versions: Storm: 1.1.0 HDFS: 2.7.3 So it seems that the problem is that HDFS Bolts keeps the files under construction forever until the topology is restarted. Is there any way to avoid this? Is there any configuration to tune that closes the files after some time? Any other ideas? Thanks!
... View more