Created on 02-22-2018 11:02 PM - edited 02-22-2018 11:05 PM
Hi
Is there a way to audit these commands in Navigator or any other place ?
Fs shell commands
copyFromLocal copyToLocal cp MoveFromLocal MoveToLocal
Created on 02-27-2018 07:07 AM - edited 02-27-2018 07:08 AM
Hi csguna,
Navigator audit is auditing the Hadoop Master roles only, and the hdfs shell commands are working as a regular HDFS client from the NameNode's perspective.
At the namenode side, where HDFS audit logs are generated, is not possible to determine why a client would like to read a file. The only thing that the namenode knows & can log that a client/user would like to open&read a file, but we have no information about what the client will actually do with the data. The client could save the data to a local disk, send it to a network service, simply display the contents of the file, or do an ordinary ETL job and write the results back to HDFS, etc.
That is why an "open" operation is logged for both 'hadoop fs -cat size.log' and 'hadoop fs -get size.log'.
Therefore with Navigator Audit, this is not currently possible, as the knowledge what the client will do with the data read from HDFS is missing.
Usually there are some ways on the OS level itself to audit what users/processes do (like the Linux audit framework), and that can be used to audit file access on the OS level. It might be possible to combine audit data form the OS and Navigator to pinpoint such operations that you mentioned, but I do not know any automated way to do that.
Created on 02-27-2018 07:07 AM - edited 02-27-2018 07:08 AM
Hi csguna,
Navigator audit is auditing the Hadoop Master roles only, and the hdfs shell commands are working as a regular HDFS client from the NameNode's perspective.
At the namenode side, where HDFS audit logs are generated, is not possible to determine why a client would like to read a file. The only thing that the namenode knows & can log that a client/user would like to open&read a file, but we have no information about what the client will actually do with the data. The client could save the data to a local disk, send it to a network service, simply display the contents of the file, or do an ordinary ETL job and write the results back to HDFS, etc.
That is why an "open" operation is logged for both 'hadoop fs -cat size.log' and 'hadoop fs -get size.log'.
Therefore with Navigator Audit, this is not currently possible, as the knowledge what the client will do with the data read from HDFS is missing.
Usually there are some ways on the OS level itself to audit what users/processes do (like the Linux audit framework), and that can be used to audit file access on the OS level. It might be possible to combine audit data form the OS and Navigator to pinpoint such operations that you mentioned, but I do not know any automated way to do that.