I am running EC2 cluster with s3 . Here when I run any hive query or some hadoop command that operates on very big data, it copies tmp files on the local disk on the nodes before/after copying them to/from s3. I know it can be configured with 'fs.s3.buffer.dir' property. Ideally it should delete and it does, but in some cases it does not delete those files, resulting in accumulation of a lot of .tmp files(in GBs) on all the nodes.. resulting in space issues. Is there anyway that we can avoid the .tmp files creation? Or somehow if we can identify why in some cases it does not delete those .tmp files and correct it? Please suggest what can be the best solution in this case.
... View more