Community Articles

SK1 · ‎02-27-2017

Sometime you may have a requirement where you need last accessed time of file in hdfs, then you may get it via following ways:

Option 1: Ranger Audit: You can get it via ranger audit but I would not prefer it because output of audit file would be confusing due to many tmp files and dirs which present in audit db.

So now to solve this requirement and to fulfill my purpose I have used java APIs and done my work very efficient and in a vary cleaned way.

Option 2: Use java and build a small program. With the help of java program you can get some other useful information about hdfs file also in few number of lines.

Step 1 : Create java program :

package com.saurabh;
import java.io.*;
import java.util.*;
import java.net.*;
import java.nio.file.Files;
import java.nio.file.Paths;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;

// For Date Conversion from long to human readable.
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Calendar;
import java.util.Date;
import java.util.concurrent.TimeUnit;

public class Accesstime {
	public static void main(String[] args) throws Exception {
		System.out.println("usage: hadoop jar accessTime.jar <local file-path>");
		System.out.println("********************************************************************");
		System.out.println("Owner,LastAccessed(Days),LastAccessed(Date),FileName");
		System.out.println("********************************************************************");
		final String delimiter = ",";
		List<String> inputLines = new ArrayList<String>();
		if (args.length != 0) {
			try {
				FileSystem fs = FileSystem.get(new Configuration());
				Scanner myScanner = new Scanner(new File(args[0]));
				FileStatus status;
				while (myScanner.hasNextLine())
				{
				   String line = myScanner.nextLine();
				   status=fs.getFileStatus(new Path(line));
				   DateFormat df = new SimpleDateFormat("yyyy-MM-dd");
				   String owner = status.getOwner();
				   long lastAccessTimeLong = status.getAccessTime();
			      	    Date lastAccessTimeDate = new Date(lastAccessTimeLong);
					Date date = new Date();
					String currentDate = df.format(date);
//					System.out.println(currentDate + " " + df.format(lastAccessTimeDate));
					long diff = date.getTime() - lastAccessTimeDate.getTime();
					inputLines.add(owner+delimiter+TimeUnit.DAYS.convert(diff, TimeUnit.MILLISECONDS)+delimiter+df.format(lastAccessTimeDate)+delimiter+line);

	}

				Comparator<String> comp = new Comparator<String>() {
					public int compare(String line1, String line2) {
						return (-1*(Long.valueOf(line1.split(delimiter)[1].trim())

								.compareTo(

										Long.valueOf(line2.split(delimiter)[1]

												.trim()))));

					}

				};







				Collections.sort(inputLines, comp);

				Iterator itr = inputLines.iterator();

//				System.out.println("--------Printing Array List-----------");

				while (itr.hasNext()) {

					System.out.println(itr.next());

				}

			} catch (Exception e) {

				System.out.println("File not found");

				e.printStackTrace();

			}

		}else{

			System.out.println("Please provide the absolute file path.");

		}

		

	}

}

Step 2: Export to jar and then copy jar file to your cluster.

Step 3: Create one local file with absolute hdfs file path.

[root@m1 ~]# cat input.txt

/user/raghu/wordcount_in/words.txt

/user/raghu/wordcount_out/_SUCCESS

/user/raghu/wordcount_out/part-r-00000

Step 4: Now run your java jar file and it will give you all required details(Owner,LastAccessed(Days),LastAccessed(Date),FileName😞

[root@m1 ~]# hadoop jar accessTime.jar input.txt

usage: hadoop jar accessTime.jar <local file-path>

********************************************************************

Owner,LastAccessed(Days),LastAccessed(Date),FileName

********************************************************************

saurkuma,20,2017-02-06,/user/raghu/wordcount_out/_SUCCESS

raghu,16,2017-02-10,/user/raghu/wordcount_in/words.txt

raghu,16,2017-02-10,/user/raghu/wordcount_out/part-r-00000

Cloudera Community

Community Articles

How to get last access time of any files in hdfs

Apache Hadoop

Apache Ranger

HDFS