Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Audit HDFS spool logs not coming in acrhive folder

Solved Go to solution

Audit HDFS spool logs not coming in acrhive folder

Contributor

Hi Team,

The audit hdfs spool log files comes directly in /var/log/hadoop/hdfs/audit/hdfs/spool directory.

[root@meybgdlpmst3] # pwd
/var/log/hadoop/hdfs/audit/hdfs/spool
[root@meybgdlpmst3(172.23.34.6)] # ls -lh
total 20G
drwxr-xr-x 2 hdfs hadoop 4.0K Jan  7 06:57 archive
-rw-r--r-- 1 hdfs hadoop  23K Jan 26 14:30 index_batch_batch.hdfs_hdfs_closed.json
-rw-r--r-- 1 hdfs hadoop 6.1K Jan 27 11:05 index_batch_batch.hdfs_hdfs.json
-rw-r--r-- 1 hdfs hadoop 7.8G Jan 25 03:43 spool_hdfs_20170124-0343.41.log
-rw-r--r-- 1 hdfs hadoop 6.6G Jan 26 03:43 spool_hdfs_20170125-0343.43.log
-rw-r--r-- 1 hdfs hadoop 3.9G Jan 27 03:44 spool_hdfs_20170126-0344.05.log
-rw-r--r-- 1 hdfs hadoop 1.6G Jan 27 11:05 spool_hdfs_20170127-0344.22.log

[root@meybgdlpmst3] # ll archive/
total 0

11817-spool.jpg

Attached is the above spool directories configured under ranger-hdfs-audit section, but still the log files doesn't comes under archive folder and hence consumes too much disk space. Is there any additional configuration needs to be done??

Any help will be highly appreciated.

Thanks,

Rahul

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Audit HDFS spool logs not coming in acrhive folder

Rahul,

Are the logs making it to HDFS? It sounds like you might be combining the "spooling" directory with the "local audit archive directory". What properties did you use during the Ranger HDFS Plugin installation? Are you doing a manual install or using Ambari?

If manual, then this reference might help: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_command-line-installation/content/install...

I wasn't able to locate your "...filespool.archive.dir" property on my cluster. I'm not sure the property is required. And may be responsible for keeping the files "locally" that you've already posted to HDFS. If the files are making it to HDFS, I would try removing this setting.

What do you have set for the property below? And are the contents being flushed from that location on a regular basis?

xasecure.audit.destination.hdfs.batch.filespool.dir

Compression doesn't happen during this process. Once they're on HDFS, you're free to do with them as you see fit. If compression is a part of that, then write an MR job to do so. (WARNING: Could affect other systems that might want to use these files as is)

Cheers,

David

4 REPLIES 4

Re: Audit HDFS spool logs not coming in acrhive folder

@Rahul Buragohain

Based on the Ranger documentation you should be able to achieve this by adding additional property:

xasecure.audit.destination.hdfs.batch.filespool.archive.dir 

pointing to the folder you want.

Here is the URL I am referring to:

https://cwiki.apache.org/confluence/display/RANGER/Ranger+0.5+Audit+Configuration#Ranger0.5AuditConf...

Highlighted

Re: Audit HDFS spool logs not coming in acrhive folder

Those are intermediate directories used to store the stream of activity locally, before it's written to HDFS.

You should have destination directories in HDFS for the final resting place.

In my experience, when this issue happens and you don't see those directories in HDFS. It could be a permissions issue or the fact that the directories just need to be created manually.

You may need to create the directories in HDFS manually and ensure they have the proper ACL's to allow them the be written to by the process.

11824-screenshot-2017-01-27-082423.png

Re: Audit HDFS spool logs not coming in acrhive folder

Contributor

@David Streever @Daniel Kozlowski

Since my NN are in HA mode, so I have to provide the HA name in xasecure.audit.destination.hdfs.property as hdfs://cluster-nameservice:8020/ranger/audit. Also I added a new property

xasecure.audit.destination.hdfs.batch.filespool.archive.dir=/var/log/hadoop/hdfs/audit/hdfs/spool/archive

Now logs are coming in archive folder but the files are very big.

[root@meybgdlpmst3] # ls -lh /var/log/hadoop/hdfs/audit/hdfs/spool/archive
total 14G
-rw-r--r-- 1 hdfs hadoop 6.0G Jan 29 03:44 spool_hdfs_20170128-0344.33.log
-rw-r--r-- 1 hdfs hadoop 7.9G Jan 30 03:44 spool_hdfs_20170129-0344.37.log

Now, is there any property for the log files so that they get compressed automatically as increasing the logs everyday in this stream will make the disk full at one day??

Thanks,

Rahul

Re: Audit HDFS spool logs not coming in acrhive folder

Rahul,

Are the logs making it to HDFS? It sounds like you might be combining the "spooling" directory with the "local audit archive directory". What properties did you use during the Ranger HDFS Plugin installation? Are you doing a manual install or using Ambari?

If manual, then this reference might help: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_command-line-installation/content/install...

I wasn't able to locate your "...filespool.archive.dir" property on my cluster. I'm not sure the property is required. And may be responsible for keeping the files "locally" that you've already posted to HDFS. If the files are making it to HDFS, I would try removing this setting.

What do you have set for the property below? And are the contents being flushed from that location on a regular basis?

xasecure.audit.destination.hdfs.batch.filespool.dir

Compression doesn't happen during this process. Once they're on HDFS, you're free to do with them as you see fit. If compression is a part of that, then write an MR job to do so. (WARNING: Could affect other systems that might want to use these files as is)

Cheers,

David