how to identify HDFS file retention? Say i need to identify whether /tmp/abc.txt file in hdfs has been accessed in the last 90 days or not, with out ranger audit mysql database.
@Raja Sekhar Chintalapati I have personally not used it but I think what you are looking for is dfs.namenode.accesstime.precision
Default value for access time is 1 hour. Check this link.
@Raja Sekhar Chintalapati As far as I know, HDFS does not track a last accessed time (atime). In fact, it is recommended to disable atime when mounting the disks. This is primarily because files are split into blocks and individual blocks will have varying atime values. Also, the overhead of writing the atime would cause a serious performance hit. HDFS does track the last modified date. You can see this in the Ambari Files view or by executing "hdfs dfs -ls </path/to/files/>"
WebHDFS (I believe / have not implemented) can retrieve access time when dfs.namenode.accesstime.precision is > 0 as @mqureshi referenced. I cannot add anything about performance issues as @Scott Shaw raises.
curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=GETFILESTATUS" for file
curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=LISTSTATUS" for directory