Posts: 22
Registered: ‎11-10-2015

distcp doesn't preserve ACL and sometime permission on non-HDFS filesystem


              I found a problem on distcp with non-HDFS filesystem, in practice it doesn't preserve ACL or extended attributes.


This could be a huge problem if you will backup HDFS on external system like s3, nfs or tape.

Moreover with cloud storage with very long retation like AWS Glacier


This is a typical output with a distcp to s3 with the new protocol s3a  :


 hadoop distcp  -Dfs.s3a.access.key=**************************** -Dfs.s3a.secret.key=*****************   -pa /******/*******     s3a://*********/***********
16/04/26 14:26:27 INFO Configuration.deprecation: fs.s3a.awsAccessKeyId is deprecated, use fs.s3a.access.key instead.
16/04/26 14:26:27 INFO Configuration.deprecation: fs.s3a.awsSecretAccessKey is deprecated, use fs.s3a.secret.key instead.
16/04/26 14:26:32 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[/******/********], targetPath=s3a://*********/*********, targetPathExists=false, preserveRawXattrs=false, filtersFile='null'}
16/04/26 14:26:32 INFO client.RMProxy: Connecting to ResourceManager at ***************/**************
16/04/26 14:26:33 ERROR tools.DistCp: ACLs not supported on at least one file system:$AclsNotSupportedException: ACLs not supported for file system: s3a://testmat




It's the same problem also on an XFS or other filesystem that supports ACL / xattr .


Moreover for s3a/s3/s3n, I could not see a method to preserve not only ACL but also normal permission.

All objects in S3 change the owner to the bucket owner, even if it's used "-p" options


I think it's an important limitation of distcp, in the scenario where a long term backup is needed.




Kind Regards