Created 05-09-2016 08:05 AM
Is there any best practices on how to handle the ranger_audit.xa_access_audit table over time when it's growing big. In particular can we truncate/delete rows without affecting lineage?
Created 05-09-2016 10:54 AM
I've been advising users to truncate the table by removing entries older than 1 or 2 months. The DB is used to support search in Ranger UI, and the events are independent, so, IMO, no harm is done. They are also storing audit data in hdfs as well, and all data remains available there if needed. When you truncate the table by date, you can use the event_time field, for example in case of Mysql
delete from xa_access_audit where event_time < DATE_SUB(now(), INTERVAL 1 MONTH);
Created 05-09-2016 10:54 AM
I've been advising users to truncate the table by removing entries older than 1 or 2 months. The DB is used to support search in Ranger UI, and the events are independent, so, IMO, no harm is done. They are also storing audit data in hdfs as well, and all data remains available there if needed. When you truncate the table by date, you can use the event_time field, for example in case of Mysql
delete from xa_access_audit where event_time < DATE_SUB(now(), INTERVAL 1 MONTH);
Created 05-10-2016 02:26 PM
@Predrag Minovic - Is it possible to remove the auditlog in hdfs as well without affecting lineage etc?
Created 05-16-2016 12:39 PM
Hi @Mats Johansson, if you use Ranger for lineage, like who did what and when to certain files or tables or databases, then have to be careful what are you deleting. You can then go for more restrictive deletions from the Audit DB if you want to keep on using it to browse the lineage. Or delete from Range DB after 1 or 2 months but keep all data in HDFS. To reduce the amount of audit data, you can also revisit your security policies and disable audit on policies which are not critical. Also, a new version of Atlas is coming, which will work together with Ranger to provide more cool features for lineage.