- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Audit HDFS spool logs not coming in acrhive folder
- Labels:
-
Apache Hadoop
Created on ‎01-27-2017 07:17 AM - edited ‎08-18-2019 06:03 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Team,
The audit hdfs spool log files comes directly in /var/log/hadoop/hdfs/audit/hdfs/spool directory.
[root@meybgdlpmst3] # pwd /var/log/hadoop/hdfs/audit/hdfs/spool [root@meybgdlpmst3(172.23.34.6)] # ls -lh total 20G drwxr-xr-x 2 hdfs hadoop 4.0K Jan 7 06:57 archive -rw-r--r-- 1 hdfs hadoop 23K Jan 26 14:30 index_batch_batch.hdfs_hdfs_closed.json -rw-r--r-- 1 hdfs hadoop 6.1K Jan 27 11:05 index_batch_batch.hdfs_hdfs.json -rw-r--r-- 1 hdfs hadoop 7.8G Jan 25 03:43 spool_hdfs_20170124-0343.41.log -rw-r--r-- 1 hdfs hadoop 6.6G Jan 26 03:43 spool_hdfs_20170125-0343.43.log -rw-r--r-- 1 hdfs hadoop 3.9G Jan 27 03:44 spool_hdfs_20170126-0344.05.log -rw-r--r-- 1 hdfs hadoop 1.6G Jan 27 11:05 spool_hdfs_20170127-0344.22.log [root@meybgdlpmst3] # ll archive/ total 0
Attached is the above spool directories configured under ranger-hdfs-audit section, but still the log files doesn't comes under archive folder and hence consumes too much disk space. Is there any additional configuration needs to be done??
Any help will be highly appreciated.
Thanks,
Rahul
Created ‎01-30-2017 05:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Rahul,
Are the logs making it to HDFS? It sounds like you might be combining the "spooling" directory with the "local audit archive directory". What properties did you use during the Ranger HDFS Plugin installation? Are you doing a manual install or using Ambari?
If manual, then this reference might help: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_command-line-installation/content/install...
I wasn't able to locate your "...filespool.archive.dir" property on my cluster. I'm not sure the property is required. And may be responsible for keeping the files "locally" that you've already posted to HDFS. If the files are making it to HDFS, I would try removing this setting.
What do you have set for the property below? And are the contents being flushed from that location on a regular basis?
xasecure.audit.destination.hdfs.batch.filespool.dir
Compression doesn't happen during this process. Once they're on HDFS, you're free to do with them as you see fit. If compression is a part of that, then write an MR job to do so. (WARNING: Could affect other systems that might want to use these files as is)
Cheers,
David
Created ‎01-27-2017 09:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Based on the Ranger documentation you should be able to achieve this by adding additional property:
xasecure.audit.destination.hdfs.batch.filespool.archive.dir
pointing to the folder you want.
Here is the URL I am referring to:
Created on ‎01-27-2017 01:29 PM - edited ‎08-18-2019 06:03 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Those are intermediate directories used to store the stream of activity locally, before it's written to HDFS.
You should have destination directories in HDFS for the final resting place.
In my experience, when this issue happens and you don't see those directories in HDFS. It could be a permissions issue or the fact that the directories just need to be created manually.
You may need to create the directories in HDFS manually and ensure they have the proper ACL's to allow them the be written to by the process.
Created ‎01-30-2017 10:42 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@David Streever @Daniel Kozlowski
Since my NN are in HA mode, so I have to provide the HA name in xasecure.audit.destination.hdfs.property as hdfs://cluster-nameservice:8020/ranger/audit. Also I added a new property
xasecure.audit.destination.hdfs.batch.filespool.archive.dir=/var/log/hadoop/hdfs/audit/hdfs/spool/archive
Now logs are coming in archive folder but the files are very big.
[root@meybgdlpmst3] # ls -lh /var/log/hadoop/hdfs/audit/hdfs/spool/archive total 14G -rw-r--r-- 1 hdfs hadoop 6.0G Jan 29 03:44 spool_hdfs_20170128-0344.33.log -rw-r--r-- 1 hdfs hadoop 7.9G Jan 30 03:44 spool_hdfs_20170129-0344.37.log
Now, is there any property for the log files so that they get compressed automatically as increasing the logs everyday in this stream will make the disk full at one day??
Thanks,
Rahul
Created ‎01-30-2017 05:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Rahul,
Are the logs making it to HDFS? It sounds like you might be combining the "spooling" directory with the "local audit archive directory". What properties did you use during the Ranger HDFS Plugin installation? Are you doing a manual install or using Ambari?
If manual, then this reference might help: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_command-line-installation/content/install...
I wasn't able to locate your "...filespool.archive.dir" property on my cluster. I'm not sure the property is required. And may be responsible for keeping the files "locally" that you've already posted to HDFS. If the files are making it to HDFS, I would try removing this setting.
What do you have set for the property below? And are the contents being flushed from that location on a regular basis?
xasecure.audit.destination.hdfs.batch.filespool.dir
Compression doesn't happen during this process. Once they're on HDFS, you're free to do with them as you see fit. If compression is a part of that, then write an MR job to do so. (WARNING: Could affect other systems that might want to use these files as is)
Cheers,
David
