Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Is there a way to find the person who deleted a directory on HDFS?

avatar
Contributor
 
5 REPLIES 5

avatar
Super Collaborator

@Sankar T

hdfs-audit.log will have that info.

avatar
Expert Contributor

@Sankar T Also Ranger audit logs if you have it installed and have the HDFS plugin enabled. In general if you're worried about who does what on your system then you should consider using Ranger at least and possibly Atlas as well.

avatar

..and if it was done from the command line, it shouldn't have been deleted, it should have been moved to the .Trash folder of the user

avatar
Rising Star

@Sankar T

You can see /grid/0/log/hdfs/hdfs/hdfs-audit.log(if ambari installed cluster) for finding who deleted the directory or file in hdfs

2017-03-09 00:04:18,495 INFO FSNamesystem.audit: allowed=true ugi=ambari-qa-cl1@EXAMPLE.COM (auth:KERBEROS) ip=/172.xx.xx.xx      cmd=delete src=/tmp/hive/ambari-qa/388b15de-5e3f-4b7a-8069-d939b64e513e dst=null perm=null proto=rpc

avatar
Rising Star

Yes, the audit log will serve the purpose. Note that, in some cases, it is not straightforward to search the log for deletion since a directory (or a file) may not be deleted directly -- it may be deleted as a part of the deletion of its parent/ancestor directory. So we should first search the full path in the log. If it is not found, search the parent directory path and so on.

It will be more complicated if deletion and re-creation occurred repeatedly. For example

1) user A: create /foo

2) user A: create /foo/bar

3) user A: del /foo

4) user B: create /foo

5) user B: del /foo

Who has deleted /foo/bar? It is easy to mistakenly take user B as the answer. B is the last user deleted foo but B is not the user deleted /foo/bar. In such case, we should first determine when the target directory/file is created and then search what happened of it starting from the creation time.

You can imagine that it is even harder to find out the correct answer if the path or the parent/ancestor paths are moved/renamed. We need to pay extra attention if the rename operation is involved.