Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Files under construction always increasing and “Exception in createBlockOutputStream” in Storm topology writing to HDFS

Files under construction always increasing and “Exception in createBlockOutputStream” in Storm topology writing to HDFS

New Contributor

I get Exception in createBlockOutputStream in Storm topology writing to HDFS using HDFS Bolt after several days of running. The partitioner set in HDFS bolt resolves to year=[year]/month=[month]/day=[day]/hour=[hour] (example: year=2020/month=06/day=26/hour=19). So the bolt is creating new files each hour.

When I left the topology running for several days, I get the exception Exception in createBlockOutputStream. When the error appears I also can see that HDFS starts to get Under Replicated Blocks greater than 0 and increasing. I also found that when the topology is running the Number of Files Under Construction keeps increasing. Also the HDFS Total Load keeps increasing. They both drops to zero when I restart the topology.

The file rotation policy configured is 120 MB. The sync policy is set to 500.000 tuples. The HDFS files are written in Avro format, so I am using AvroGenericRecordBolt class. I am using HDFS in HA mode.

Versions:

  • Storm: 1.1.0
  • HDFS: 2.7.3

So it seems that the problem is that HDFS Bolts keeps the files under construction forever until the topology is restarted. Is there any way to avoid this? Is there any configuration to tune that closes the files after some time? Any other ideas?

Thanks!

Don't have an account?
Coming from Hortonworks? Activate your account here