Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

/atsv2 large size. How to purge old data?

Highlighted

/atsv2 large size. How to purge old data?

New Contributor

Hello,

I see the path /atsv2 in HDFS have large size, and it growing, this path contain embedded hbase data of Yarn ATS.


Have Anyone can explain about this path, and how to purge old data?


Thanks.

4 REPLIES 4

Re: /atsv2 large size. How to purge old data?

Cloudera Employee

Hi @son trinh !


The default config for the ATSv2 tables is to keep data for 30 days, so you should decrease this config to get smaller footprint on ATSv2.

You can change this by lowering the TTL on the tables, for example, setting expiration to 15 days (=1296000 seconds).

Assuming you're running HBase in embedded mode for atsv2 (remember that ATSv2 HBase can also be run in Service mode) :

Run this as yarn-ats user, with kerberos ticket if on a kerberized env.:

hbase --config /etc/hadoop/conf/embedded-yarn-ats-hbase shell


and inside the hbase shell, run these:


alter 'prod.timelineservice.application', {NAME=> 'm',TTL => 1296000 }
alter 'prod.timelineservice.subapplication', {NAME=> 'm',TTL => 1296000 }
alter 'prod.timelineservice.entity', {NAME=> 'm',TTL => 1296000 }


That should keep the ATSv2 db smaller.


Regards

--

Tomas

Re: /atsv2 large size. How to purge old data?

New Contributor

Hi Tomas,


Many thanks. I will try to do follow your guide.

Re: /atsv2 large size. How to purge old data?

Cloudera Employee

Great, @son trinh, let me know how it goes.


Oh, and also, if you don't get the amount of disk space back you need, we can set TTLs to the other data.

My recommendation above is for the metrics column families on the tables (like how much memory and CPU per container) which are the least important and also the ones that come with an expiration period by default, so that you don't lose job execution metadata (like where and what and when was executed, exit status, etc.), but if required and you are OK with not having that information after the retention period, we could also get the rest of the ATSv2 data to expire with:

alter 'prod.timelineservice.application', {NAME=> 'c',TTL => 1296000}
alter 'prod.timelineservice.application', {NAME=> 'i',TTL => 1296000}
alter 'prod.timelineservice.app_flow', {NAME=> 'm',TTL => 1296000}
alter 'prod.timelineservice.entity', {NAME=> 'c',TTL => 1296000}
alter 'prod.timelineservice.entity', {NAME=> 'i',TTL => 1296000}
alter 'prod.timelineservice.flowrun', {NAME=> 'i',TTL => 1296000}
alter 'prod.timelineservice.flowactivity', {NAME=> 'i',TTL => 1296000}
alter 'prod.timelineservice.subapplication', {NAME=> 'c',TTL => 1296000}
alter 'prod.timelineservice.subapplication', {NAME=> 'i',TTL => 1296000}

Regards,

--

Tomas

Re: /atsv2 large size. How to purge old data?

New Contributor

Hi Tomas,


I applied these and run compaction manually. The size of /atsv2 is smaller.


Many thanks!

Don't have an account?
Coming from Hortonworks? Activate your account here