Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Is there anyway to get last access time of hdfs files ?

Solved Go to solution

Is there anyway to get last access time of hdfs files ?

Guru

Team,

I have requirement where I need last accessed time of file, I tried to get it from ranger audit db but there are so many tmp files and dirs which are creating any issue.

So is there any other way like command or API to get last access time.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Is there anyway to get last access time of hdfs files ?

Super Mentor

@Saurabh

Following is a simple example:

import java.io.*;
import java.util.*;
import java.net.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;


// For Date Conversion from long to human readable.
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;

public class FileStatusChecker {
    public static void main (String [] args) throws Exception {
        try{
            FileSystem fs = FileSystem.get(new Configuration());
            String hdfsFilePath = "hdfs://My-NN-HA/Demos/SparkDemos/inputFile.txt";
            FileStatus[] status = fs.listStatus(new Path(hdfsFilePath));  // you need to pass in your hdfs path

            for (int i=0;i<status.length;i++){
                long lastAccessTimeLong = status[i].getAccessTime();
                Date lastAccessTimeDate = new Date(lastAccessTimeLong);
                DateFormat df = new SimpleDateFormat("EEE, d MMM yyyy HH:mm:ss Z");
                System.out.println("The file '"+ hdfsFilePath + "' was accessed last at: "+ df.format(lastAccessTimeDate));
            }
        }catch(Exception e){
            System.out.println("File not found");
            e.printStackTrace();
        }
    }
}
6 REPLIES 6

Re: Is there anyway to get last access time of hdfs files ?

Super Mentor

@Saurabh

One approach will be to look at the "/var/log/hadoop/hdfs/hdfs-audit.log" log file and find for the operation "cmd=open" for the mentioned file. F

For example if i want to get when the "open" request was raised for file "/Demos/SparkDemos/inputFile.txt" then in the hdfs-audit.log i can get following kind of entry with the timestamp:

tail -f /var/log/hadoop/hdfs/hdfs-audit.log

2017-01-31 07:04:07,766 INFO FSNamesystem.audit: allowed=true    ugi=admin (auth:PROXY) via root (auth:SIMPLE)    ip=/172.26.70.151    cmd=open    src=/Demos/SparkDemos/inputFile.txt    dst=null    perm=null    proto=webhdfs

.

Re: Is there anyway to get last access time of hdfs files ?

Guru

@Jay SenSharma: Can you please help me to get dir last accessed time also. Above one is not working for dir.

Re: Is there anyway to get last access time of hdfs files ?

Super Guru

@Saurabh

What is the value of the following property?

dfs.namenode.accesstime.precision

check docs here --> https://hadoop.apache.org/docs/r2.6.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

Use FileStatus.getAccessTime() to get the last access time. It will depend on the precision set above. If it's currently set to zero then you don't have access time. If it's set to default of one hour then you can get access time up to the precision of one hour. If you have set your own precision, then you get whatever you have set.

https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/fs/FileStatus.html

Re: Is there anyway to get last access time of hdfs files ?

Guru

@mqureshi. This property was set to 0 but later I changed it and I used Jay's suggestion to make it work. Thanks for your help.

Re: Is there anyway to get last access time of hdfs files ?

Super Mentor

@Saurabh

Following is a simple example:

import java.io.*;
import java.util.*;
import java.net.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;


// For Date Conversion from long to human readable.
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;

public class FileStatusChecker {
    public static void main (String [] args) throws Exception {
        try{
            FileSystem fs = FileSystem.get(new Configuration());
            String hdfsFilePath = "hdfs://My-NN-HA/Demos/SparkDemos/inputFile.txt";
            FileStatus[] status = fs.listStatus(new Path(hdfsFilePath));  // you need to pass in your hdfs path

            for (int i=0;i<status.length;i++){
                long lastAccessTimeLong = status[i].getAccessTime();
                Date lastAccessTimeDate = new Date(lastAccessTimeLong);
                DateFormat df = new SimpleDateFormat("EEE, d MMM yyyy HH:mm:ss Z");
                System.out.println("The file '"+ hdfsFilePath + "' was accessed last at: "+ df.format(lastAccessTimeDate));
            }
        }catch(Exception e){
            System.out.println("File not found");
            e.printStackTrace();
        }
    }
}

Re: Is there anyway to get last access time of hdfs files ?

Guru

Thank you so much @Jay SenSharma. It is working fine and I made it parameterized for file path also.

Don't have an account?
Coming from Hortonworks? Activate your account here