I have large number of files small files in hdfs to be zipped.
Simple logic is to get them to the edge node, zip them and push the zipped file to hdfs.
As I have large number of files(in millions), I have the shell script logic to group them into 9000 files per zipfile and I trigger 20 counters in parallel. So, 20 folders, each having 9000 files are fetched from hdfs(using hdfs -get command) and zip that folder and put the zipped file back to hdfs.
While this script runs perfectly fine when tested from edge node, it fails when integrated with a Oozie workflow with "-get FileSystem closed" error.
By the error message, it seems like the filesystem was closed while hdfs is still getting the files to edge node. Not sure!