HDP 18.104.22.168-169 with Ranger 0.5.0.2.4.
We're running out of disk space due to a huge xa_access_audit.ibd file. It's ~170GB in size, and I can't figure out how to truncate it.
First, I tried with mysql query:
delete from xa_access_audit where event_time < NOW() - INTERVAL 80 DAY;
but it takes hours to complete (and it times out occasionally). It doesn't actually truncate the file -- once the query is executed I still see the same size of xa_access_audit.ibd.
I reached the stage (INTERVAL 71 DAY) where if I execute 'delete from xa_access_audit where event_time < NOW() - INTERVAL 71 DAY;' multiple times I still see rows being deleted, so I don't quite understand how this is possible.
I'm also a bit concerned about the number of events submitted -- it's ~5 million per day. Is this normal?
Thank you and any tips would be appreciated.
By deleting rows from XA_ACCESS_AUDIT table you won't able to free disk space because when we remove rows, they are just marked as deleted on disk but space will be consumed by .ibd files which can be re-used later when you insert/update more rows but it will never shrink.
But, if you are using innodb_file_per_table then you can reclaim the space by running OPTIMIZE TABLE on that table. OPTIMIZE TABLE will create a new identical empty table. Then it will copy row by row data from old table to the new one. In this process, a new .ibd tablespace will be created and space will be reclaimed.
I would like to mention here that during this process the table will be locked.(Table locked for just Writes) Which can affect performance when you’ll have any table with a large number of rows.
From your query, it seems you want to keep 'n' days old data. In that case, you might have to plan deletion of 'n' day old data every day.
To reclaim some disk space quickly you can try steps given below: (If require take the help of DBA of your team)
1. Take the backup of required data in .csv file(assuming that you don't have much disk space available).
Sample Command :
SELECT create_time,update_time,added_by_id,upd_by_id,audit_type,access_result,access_type,acl_enforcer,agent_id,client_ip,client_type,policy_id,repo_name,repo_type,result_reason,session_id,event_time,request_user,action,request_data,resource_path,resource_type,seq_num,event_count,event_dur_ms from xa_access_audit WHERE date(event_time) >= NOW() - INTERVAL 30 DAY INTO OUTFILE '/tmp/xa_access_audit_backup20160906.csv' FIELDS TERMINATED BY '|' ENCLOSED BY '' LINES TERMINATED BY '\n';
Please verify the generated .csv file.
2. Truncate the table to reclaim disk space(this process will require downtime of few seconds).
Sample Command :
truncate table xa_access_audit;
3. Reload the audit logs from .csv file generated in step 1.
Sample Command :
LOAD DATA INFILE '/tmp/xa_access_audit_backup20160906.csv' INTO TABLE xa_access_audit FIELDS TERMINATED BY '|' ENCLOSED BY '' LINES TERMINATED BY '\n' (create_time,update_time,added_by_id,upd_by_id,audit_type,access_result,access_type,acl_enforcer,agent_id,client_ip,client_type,policy_id,repo_name,repo_type,result_reason,session_id,event_time,request_user,`action`,request_data,resource_path,resource_type,seq_num,event_count,event_dur_ms);
Note : Step 1 can be done in batches like datewise, monthwise or with fixed number of rows. Step 3 execution shall change accordingly.