Support Questions

sontt · ‎07-03-2019

Hello,

I see the path /atsv2 in HDFS have large size, and it growing, this path contain embedded hbase data of Yarn ATS.

Have Anyone can explain about this path, and how to purge old data?

Thanks.

tsokorai · ‎07-04-2019

The default config for the ATSv2 tables is to keep data for 30 days, so you should decrease this config to get smaller footprint on ATSv2.

You can change this by lowering the TTL on the tables, for example, setting expiration to 15 days (=1296000 seconds).

Assuming you're running HBase in embedded mode for atsv2 (remember that ATSv2 HBase can also be run in Service mode) :

Run this as yarn-ats user, with kerberos ticket if on a kerberized env.:

hbase --config /etc/hadoop/conf/embedded-yarn-ats-hbase shell

and inside the hbase shell, run these:

alter 'prod.timelineservice.application', {NAME=> 'm',TTL => 1296000 }
alter 'prod.timelineservice.subapplication', {NAME=> 'm',TTL => 1296000 }
alter 'prod.timelineservice.entity', {NAME=> 'm',TTL => 1296000 }

That should keep the ATSv2 db smaller.

Regards

--

Tomas

View solution in original post

tsokorai · ‎07-04-2019

Hi @son trinh !

The default config for the ATSv2 tables is to keep data for 30 days, so you should decrease this config to get smaller footprint on ATSv2.

You can change this by lowering the TTL on the tables, for example, setting expiration to 15 days (=1296000 seconds).

Assuming you're running HBase in embedded mode for atsv2 (remember that ATSv2 HBase can also be run in Service mode) :

Run this as yarn-ats user, with kerberos ticket if on a kerberized env.:

hbase --config /etc/hadoop/conf/embedded-yarn-ats-hbase shell

and inside the hbase shell, run these:

alter 'prod.timelineservice.application', {NAME=> 'm',TTL => 1296000 }
alter 'prod.timelineservice.subapplication', {NAME=> 'm',TTL => 1296000 }
alter 'prod.timelineservice.entity', {NAME=> 'm',TTL => 1296000 }

That should keep the ATSv2 db smaller.

Regards

--

Tomas

sontt · ‎07-05-2019

Hi Tomas,

Many thanks. I will try to do follow your guide.

tsokorai · ‎07-05-2019

Great, @son trinh, let me know how it goes.

Oh, and also, if you don't get the amount of disk space back you need, we can set TTLs to the other data.

My recommendation above is for the metrics column families on the tables (like how much memory and CPU per container) which are the least important and also the ones that come with an expiration period by default, so that you don't lose job execution metadata (like where and what and when was executed, exit status, etc.), but if required and you are OK with not having that information after the retention period, we could also get the rest of the ATSv2 data to expire with:

alter 'prod.timelineservice.application', {NAME=> 'c',TTL => 1296000}
alter 'prod.timelineservice.application', {NAME=> 'i',TTL => 1296000}
alter 'prod.timelineservice.app_flow', {NAME=> 'm',TTL => 1296000}
alter 'prod.timelineservice.entity', {NAME=> 'c',TTL => 1296000}
alter 'prod.timelineservice.entity', {NAME=> 'i',TTL => 1296000}
alter 'prod.timelineservice.flowrun', {NAME=> 'i',TTL => 1296000}
alter 'prod.timelineservice.flowactivity', {NAME=> 'i',TTL => 1296000}
alter 'prod.timelineservice.subapplication', {NAME=> 'c',TTL => 1296000}
alter 'prod.timelineservice.subapplication', {NAME=> 'i',TTL => 1296000}

Regards,

--

Tomas

sontt · ‎07-08-2019

Hi Tomas,

I applied these and run compaction manually. The size of /atsv2 is smaller.

Many thanks!

Cloudera Community

Support Questions

/atsv2 large size. How to purge old data?