- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Is there anyway to get last access time of hdfs files ?
- Labels:
-
Apache Ranger
-
HDFS
Created on ‎01-31-2017 06:43 AM - edited ‎09-16-2022 03:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Team,
I have requirement where I need last accessed time of file, I tried to get it from ranger audit db but there are so many tmp files and dirs which are creating any issue.
So is there any other way like command or API to get last access time.
Created ‎01-31-2017 08:17 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Following is a simple example:
import java.io.*; import java.util.*; import java.net.*; import org.apache.hadoop.fs.*; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.util.*; // For Date Conversion from long to human readable. import java.text.DateFormat; import java.text.SimpleDateFormat; import java.util.Calendar; import java.util.Date; public class FileStatusChecker { public static void main (String [] args) throws Exception { try{ FileSystem fs = FileSystem.get(new Configuration()); String hdfsFilePath = "hdfs://My-NN-HA/Demos/SparkDemos/inputFile.txt"; FileStatus[] status = fs.listStatus(new Path(hdfsFilePath)); // you need to pass in your hdfs path for (int i=0;i<status.length;i++){ long lastAccessTimeLong = status[i].getAccessTime(); Date lastAccessTimeDate = new Date(lastAccessTimeLong); DateFormat df = new SimpleDateFormat("EEE, d MMM yyyy HH:mm:ss Z"); System.out.println("The file '"+ hdfsFilePath + "' was accessed last at: "+ df.format(lastAccessTimeDate)); } }catch(Exception e){ System.out.println("File not found"); e.printStackTrace(); } } }
Created ‎01-31-2017 07:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
One approach will be to look at the "/var/log/hadoop/hdfs/hdfs-audit.log" log file and find for the operation "cmd=open" for the mentioned file. F
For example if i want to get when the "open" request was raised for file "/Demos/SparkDemos/inputFile.txt" then in the hdfs-audit.log i can get following kind of entry with the timestamp:
tail -f /var/log/hadoop/hdfs/hdfs-audit.log 2017-01-31 07:04:07,766 INFO FSNamesystem.audit: allowed=true ugi=admin (auth:PROXY) via root (auth:SIMPLE) ip=/172.26.70.151 cmd=open src=/Demos/SparkDemos/inputFile.txt dst=null perm=null proto=webhdfs
.
Created ‎02-02-2017 10:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Jay SenSharma: Can you please help me to get dir last accessed time also. Above one is not working for dir.
Created ‎01-31-2017 07:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is the value of the following property?
dfs.namenode.accesstime.precision |
check docs here --> https://hadoop.apache.org/docs/r2.6.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
Use FileStatus.getAccessTime() to get the last access time. It will depend on the precision set above. If it's currently set to zero then you don't have access time. If it's set to default of one hour then you can get access time up to the precision of one hour. If you have set your own precision, then you get whatever you have set.
https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/fs/FileStatus.html
Created ‎02-01-2017 10:09 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@mqureshi. This property was set to 0 but later I changed it and I used Jay's suggestion to make it work. Thanks for your help.
Created ‎01-31-2017 08:17 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Following is a simple example:
import java.io.*; import java.util.*; import java.net.*; import org.apache.hadoop.fs.*; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.util.*; // For Date Conversion from long to human readable. import java.text.DateFormat; import java.text.SimpleDateFormat; import java.util.Calendar; import java.util.Date; public class FileStatusChecker { public static void main (String [] args) throws Exception { try{ FileSystem fs = FileSystem.get(new Configuration()); String hdfsFilePath = "hdfs://My-NN-HA/Demos/SparkDemos/inputFile.txt"; FileStatus[] status = fs.listStatus(new Path(hdfsFilePath)); // you need to pass in your hdfs path for (int i=0;i<status.length;i++){ long lastAccessTimeLong = status[i].getAccessTime(); Date lastAccessTimeDate = new Date(lastAccessTimeLong); DateFormat df = new SimpleDateFormat("EEE, d MMM yyyy HH:mm:ss Z"); System.out.println("The file '"+ hdfsFilePath + "' was accessed last at: "+ df.format(lastAccessTimeDate)); } }catch(Exception e){ System.out.println("File not found"); e.printStackTrace(); } } }
Created ‎02-01-2017 10:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much @Jay SenSharma. It is working fine and I made it parameterized for file path also.
