Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What are HDFS NFS "access times"?

avatar
Expert Contributor

Having a problem with HDFS NFS, addressed on another site where it is recommended to set hdfs-site.xml like...

<property> 
<name>dfs.namenode.accesstime.precision</name> 
<value>3600000</value> 
<description> The access time for HDFS file is precise upto this value. The default value is 1 hour. Setting a value of 0 disables access times for HDFS. </description> 
</property>

Am confused about what exactly "access times for HDFS" means / is. Looking at the hadoop docs, was still not able to determine. Could someone give better understanding as to what this is doing? Also where is the nfs3 daemon log file ?

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Here is the answer that I was told from discussion on the apache hadoop mailing list:

I think access time refers to the POSIX atime attribute for files, the “time of last access” as described here for instance [1]. While HDFS keeps a correct modification time (mtime), which is important, easy and cheap, it only keeps a very low-resolution sense of last access time, which is less important, and expensive to monitor and record, as described here [2] and here [3]. It doesn’t even expose this low-rez atime value in the hadoop fs -stat command; you need to use Java if you want to read it from HDFS apis.


However, to have a conforming NFS api, you must present atime, and so the HDFS NFS implementation does. But first you have to configure it on. The documentation says that the default value is 3,600,000 milliseconds (1 hour), but many sites have been advised to turn it off entirely by setting it to zero, to improve HDFS overall performance. See for example here ( [4], section "Don’t let Reads become Writes”). So if your site has turned off atime in HDFS, you will need to turn it back on to fully enable NFS. Alternatively, you can maintain optimum efficiency by mounting NFS with the “noatime” option, as described in the document you reference.


I don’t know where the nfs3 daemon log file is, but it is almost certainly on the server node where you’ve configured the NFS service to be served from. Log into it and check under /var/log, eg with find /var/log -name ‘*nfs3*’ -print

[1] https://www.unixtutorial.org/atime-ctime-mtime-in-unix-filesystems

[2] https://issues.apache.org/jira/browse/HADOOP-1869

[3] https://superuser.com/questions/464290/why-is-cat-not-changing-the-access-time

[4] https://community.hortonworks.com/articles/43861/scaling-the-hdfs-namenode-part-4-avoiding-performa....

View solution in original post

1 REPLY 1

avatar
Expert Contributor

Here is the answer that I was told from discussion on the apache hadoop mailing list:

I think access time refers to the POSIX atime attribute for files, the “time of last access” as described here for instance [1]. While HDFS keeps a correct modification time (mtime), which is important, easy and cheap, it only keeps a very low-resolution sense of last access time, which is less important, and expensive to monitor and record, as described here [2] and here [3]. It doesn’t even expose this low-rez atime value in the hadoop fs -stat command; you need to use Java if you want to read it from HDFS apis.


However, to have a conforming NFS api, you must present atime, and so the HDFS NFS implementation does. But first you have to configure it on. The documentation says that the default value is 3,600,000 milliseconds (1 hour), but many sites have been advised to turn it off entirely by setting it to zero, to improve HDFS overall performance. See for example here ( [4], section "Don’t let Reads become Writes”). So if your site has turned off atime in HDFS, you will need to turn it back on to fully enable NFS. Alternatively, you can maintain optimum efficiency by mounting NFS with the “noatime” option, as described in the document you reference.


I don’t know where the nfs3 daemon log file is, but it is almost certainly on the server node where you’ve configured the NFS service to be served from. Log into it and check under /var/log, eg with find /var/log -name ‘*nfs3*’ -print

[1] https://www.unixtutorial.org/atime-ctime-mtime-in-unix-filesystems

[2] https://issues.apache.org/jira/browse/HADOOP-1869

[3] https://superuser.com/questions/464290/why-is-cat-not-changing-the-access-time

[4] https://community.hortonworks.com/articles/43861/scaling-the-hdfs-namenode-part-4-avoiding-performa....