Support Questions

Find answers, ask questions, and share your expertise

Is there anyway to get last access time of hdfs files ?

avatar
Guru

Team,

I have requirement where I need last accessed time of file, I tried to get it from ranger audit db but there are so many tmp files and dirs which are creating any issue.

So is there any other way like command or API to get last access time.

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Saurabh

Following is a simple example:

import java.io.*;
import java.util.*;
import java.net.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;


// For Date Conversion from long to human readable.
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;

public class FileStatusChecker {
    public static void main (String [] args) throws Exception {
        try{
            FileSystem fs = FileSystem.get(new Configuration());
            String hdfsFilePath = "hdfs://My-NN-HA/Demos/SparkDemos/inputFile.txt";
            FileStatus[] status = fs.listStatus(new Path(hdfsFilePath));  // you need to pass in your hdfs path

            for (int i=0;i<status.length;i++){
                long lastAccessTimeLong = status[i].getAccessTime();
                Date lastAccessTimeDate = new Date(lastAccessTimeLong);
                DateFormat df = new SimpleDateFormat("EEE, d MMM yyyy HH:mm:ss Z");
                System.out.println("The file '"+ hdfsFilePath + "' was accessed last at: "+ df.format(lastAccessTimeDate));
            }
        }catch(Exception e){
            System.out.println("File not found");
            e.printStackTrace();
        }
    }
}

View solution in original post

6 REPLIES 6

avatar
Master Mentor

@Saurabh

One approach will be to look at the "/var/log/hadoop/hdfs/hdfs-audit.log" log file and find for the operation "cmd=open" for the mentioned file. F

For example if i want to get when the "open" request was raised for file "/Demos/SparkDemos/inputFile.txt" then in the hdfs-audit.log i can get following kind of entry with the timestamp:

tail -f /var/log/hadoop/hdfs/hdfs-audit.log

2017-01-31 07:04:07,766 INFO FSNamesystem.audit: allowed=true    ugi=admin (auth:PROXY) via root (auth:SIMPLE)    ip=/172.26.70.151    cmd=open    src=/Demos/SparkDemos/inputFile.txt    dst=null    perm=null    proto=webhdfs

.

avatar
Guru

@Jay SenSharma: Can you please help me to get dir last accessed time also. Above one is not working for dir.

avatar
Super Guru

@Saurabh

What is the value of the following property?

dfs.namenode.accesstime.precision

check docs here --> https://hadoop.apache.org/docs/r2.6.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

Use FileStatus.getAccessTime() to get the last access time. It will depend on the precision set above. If it's currently set to zero then you don't have access time. If it's set to default of one hour then you can get access time up to the precision of one hour. If you have set your own precision, then you get whatever you have set.

https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/fs/FileStatus.html

avatar
Guru

@mqureshi. This property was set to 0 but later I changed it and I used Jay's suggestion to make it work. Thanks for your help.

avatar
Master Mentor

@Saurabh

Following is a simple example:

import java.io.*;
import java.util.*;
import java.net.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;


// For Date Conversion from long to human readable.
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;

public class FileStatusChecker {
    public static void main (String [] args) throws Exception {
        try{
            FileSystem fs = FileSystem.get(new Configuration());
            String hdfsFilePath = "hdfs://My-NN-HA/Demos/SparkDemos/inputFile.txt";
            FileStatus[] status = fs.listStatus(new Path(hdfsFilePath));  // you need to pass in your hdfs path

            for (int i=0;i<status.length;i++){
                long lastAccessTimeLong = status[i].getAccessTime();
                Date lastAccessTimeDate = new Date(lastAccessTimeLong);
                DateFormat df = new SimpleDateFormat("EEE, d MMM yyyy HH:mm:ss Z");
                System.out.println("The file '"+ hdfsFilePath + "' was accessed last at: "+ df.format(lastAccessTimeDate));
            }
        }catch(Exception e){
            System.out.println("File not found");
            e.printStackTrace();
        }
    }
}

avatar
Guru

Thank you so much @Jay SenSharma. It is working fine and I made it parameterized for file path also.