Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Audit HDFS spool logs not coming in acrhive folder

avatar
Rising Star

Hi Team,

The audit hdfs spool log files comes directly in /var/log/hadoop/hdfs/audit/hdfs/spool directory.

[root@meybgdlpmst3] # pwd
/var/log/hadoop/hdfs/audit/hdfs/spool
[root@meybgdlpmst3(172.23.34.6)] # ls -lh
total 20G
drwxr-xr-x 2 hdfs hadoop 4.0K Jan  7 06:57 archive
-rw-r--r-- 1 hdfs hadoop  23K Jan 26 14:30 index_batch_batch.hdfs_hdfs_closed.json
-rw-r--r-- 1 hdfs hadoop 6.1K Jan 27 11:05 index_batch_batch.hdfs_hdfs.json
-rw-r--r-- 1 hdfs hadoop 7.8G Jan 25 03:43 spool_hdfs_20170124-0343.41.log
-rw-r--r-- 1 hdfs hadoop 6.6G Jan 26 03:43 spool_hdfs_20170125-0343.43.log
-rw-r--r-- 1 hdfs hadoop 3.9G Jan 27 03:44 spool_hdfs_20170126-0344.05.log
-rw-r--r-- 1 hdfs hadoop 1.6G Jan 27 11:05 spool_hdfs_20170127-0344.22.log

[root@meybgdlpmst3] # ll archive/
total 0

11817-spool.jpg

Attached is the above spool directories configured under ranger-hdfs-audit section, but still the log files doesn't comes under archive folder and hence consumes too much disk space. Is there any additional configuration needs to be done??

Any help will be highly appreciated.

Thanks,

Rahul

1 ACCEPTED SOLUTION

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
4 REPLIES 4

avatar

@Rahul Buragohain

Based on the Ranger documentation you should be able to achieve this by adding additional property:

xasecure.audit.destination.hdfs.batch.filespool.archive.dir 

pointing to the folder you want.

Here is the URL I am referring to:

https://cwiki.apache.org/confluence/display/RANGER/Ranger+0.5+Audit+Configuration#Ranger0.5AuditConf...

avatar

Those are intermediate directories used to store the stream of activity locally, before it's written to HDFS.

You should have destination directories in HDFS for the final resting place.

In my experience, when this issue happens and you don't see those directories in HDFS. It could be a permissions issue or the fact that the directories just need to be created manually.

You may need to create the directories in HDFS manually and ensure they have the proper ACL's to allow them the be written to by the process.

11824-screenshot-2017-01-27-082423.png

avatar
Rising Star

@David Streever @Daniel Kozlowski

Since my NN are in HA mode, so I have to provide the HA name in xasecure.audit.destination.hdfs.property as hdfs://cluster-nameservice:8020/ranger/audit. Also I added a new property

xasecure.audit.destination.hdfs.batch.filespool.archive.dir=/var/log/hadoop/hdfs/audit/hdfs/spool/archive

Now logs are coming in archive folder but the files are very big.

[root@meybgdlpmst3] # ls -lh /var/log/hadoop/hdfs/audit/hdfs/spool/archive
total 14G
-rw-r--r-- 1 hdfs hadoop 6.0G Jan 29 03:44 spool_hdfs_20170128-0344.33.log
-rw-r--r-- 1 hdfs hadoop 7.9G Jan 30 03:44 spool_hdfs_20170129-0344.37.log

Now, is there any property for the log files so that they get compressed automatically as increasing the logs everyday in this stream will make the disk full at one day??

Thanks,

Rahul

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login