We using flume ingestig data into HDFS.
Flume sink is configured fileType=CompressedStream and codeC=snappy
If on any reason flume agent dies or namenode (HA) restarts, flume current sink file will left open - .tmp extension.
testfile.snappy.tmp for example.
Is there any way to decompress such file and get data back human readable form?
Or is there any tool to fix such files?
We can use any other compression too, if there is a way to fix such files.
AFAIK there are options to discover and 'repair' corrupted files that are stored in HDFS. Most common reasons for file file corruption are associated to hdfs blocks missing or corrupted. HDFS may automatically act to fix such corrupt files periodically depending on cause like missing block or checksum mismatch etc. But in your case the file itself is still open and not considered 'complete' or 'closed' by hdfs so unless you have way to recreate entire file from source system by 'reprocessing' such files cant be 'fixed'.