Support Questions
Find answers, ask questions, and share your expertise

Can the ranger_audit.xa_access_audit table be truncated without affecting lineage?

Expert Contributor

Is there any best practices on how to handle the ranger_audit.xa_access_audit table over time when it's growing big. In particular can we truncate/delete rows without affecting lineage?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Can the ranger_audit.xa_access_audit table be truncated without affecting lineage?

I've been advising users to truncate the table by removing entries older than 1 or 2 months. The DB is used to support search in Ranger UI, and the events are independent, so, IMO, no harm is done. They are also storing audit data in hdfs as well, and all data remains available there if needed. When you truncate the table by date, you can use the event_time field, for example in case of Mysql

delete from xa_access_audit where event_time < DATE_SUB(now(), INTERVAL 1 MONTH);

View solution in original post

3 REPLIES 3

Re: Can the ranger_audit.xa_access_audit table be truncated without affecting lineage?

I've been advising users to truncate the table by removing entries older than 1 or 2 months. The DB is used to support search in Ranger UI, and the events are independent, so, IMO, no harm is done. They are also storing audit data in hdfs as well, and all data remains available there if needed. When you truncate the table by date, you can use the event_time field, for example in case of Mysql

delete from xa_access_audit where event_time < DATE_SUB(now(), INTERVAL 1 MONTH);

View solution in original post

Re: Can the ranger_audit.xa_access_audit table be truncated without affecting lineage?

Expert Contributor

@Predrag Minovic - Is it possible to remove the auditlog in hdfs as well without affecting lineage etc?

Re: Can the ranger_audit.xa_access_audit table be truncated without affecting lineage?

Hi @Mats Johansson, if you use Ranger for lineage, like who did what and when to certain files or tables or databases, then have to be careful what are you deleting. You can then go for more restrictive deletions from the Audit DB if you want to keep on using it to browse the lineage. Or delete from Range DB after 1 or 2 months but keep all data in HDFS. To reduce the amount of audit data, you can also revisit your security policies and disable audit on policies which are not critical. Also, a new version of Atlas is coming, which will work together with Ranger to provide more cool features for lineage.