Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Retention policy for Cloudera Management Services

avatar
Expert Contributor

Hello,

 

How do I change data (logs) retention policy for all Cloudera Management Services  like Service Monitor, Host Monitor and Event Server. My target is to retain data (logs) only last 7 days.

 

Cloudera Enterprise 5.11.1

 

Regards

Wert

 

 

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Hello Wert,

 

As per the information provided, you have mentioned free space available.

 

16.09%(free: 3.2 GiB) of free space in /var/lib/cloudera-host-monitor.

 

35.25%(free: 4.9 GiB) of free space in /var/log/cloudera-scm-eventserver.

 

35.81%(free: 5.0 GiB) of free space in /var/log/cloudera-scm-alertpublisher.

 

Thus explaining the alert for low disk space.

 

The data in "/var/lib/cloudera-[host|service]-monitor" is the sum total of the working data for these respective services. Time-series metrics and health data - Time-Series Storage (firehose_time_series_storage_bytes - 10 GB default, 10 GB minimum)

 

My suggestions:

 

1.) Change the default directory("/var/lib/cloudera-[host|service]-monitor") to some other location in your environment with enough space.

 

>> Stop the Service(Service Monitor or Host Monitor).

>> Save your old data and then copy the current directory to the new directory(optional)(Only if you need the old data).

>> Update the Storage Directory configuration option (firehose.storage.base.directory) on the corresponding role configuration page.

>> Start the Service Monitor or Host Monitor.


2.) If the data available in "/var/lib/cloudera-host-monitor" is not of much importance you can remove the data manually. But it's not a recommended step.

 

Your Health statuses will be Unknown or Bad for a short time and you will lose all Charts in the UI while the timeseries store is rebuilt and repopulated (due to the fact that you are deleting ALL the historical metrics). But this shouldn't have an impact on any service.

 

3.) Either add more disk to the cluster or remove unused/unnecessary files available on the disk to free up some disk space.

 

Regards.

View solution in original post

9 REPLIES 9

avatar
Expert Contributor

Hello,

 

With "firehose_time_series_storage_bytes" parameter in Cloudera Manager.

 

We can control the approximate amount of disk space dedicated to storing time series and health data. Once the store has reached its maximum size, older data is deleted to make room for newer data. The disk usage is approximate because data is deleted only when the limit is reached.

 

But configuring the log retention based on time seems unlikely. However, you can write a shell script to remove the data every 7 days from the "Service Monitor Storage Directory".

 

By default, the data is stored in /var/lib/cloudera-service-monitor/ on the Service Monitor host. You can change this by modifying the Service Monitor Storage Directory configuration (firehose.storage.base.directory). But this step is not recommended by Cloudera.

avatar
Super Collaborator

In my understanding the question was not about timeseries storage but actual role log files, is this correct @wert_1311 ? Please note that there are only log file size oriented options available for Cloudera Management Services and cluster services, but no time based options.

avatar
Expert Contributor

Hello Tony,

 

Thank you for your reply, since there is no built-in mechanism to delete logs after 7 days I guess  the best option for us, would be to adjust "firehose_time_series_storage_bytes" property.  have some questions around the same here:

 

  1. Currently the Time series storage is set at 10GiB for Host Monitor & Service Monitor (screenshot attached), I presume this is the defaullt setting, can this be lowered? if yes, then will lowering this to example 5GiB get rid of the data logs and once the the status is green can we change it back to 10GiB. 
  2. As mentioend 'data is deleted only when the limit has reached', then is that limit 10Gib? will the oldest data /logs will be deleted after this threshold of 10 GiB ?
  3. Changing the parameters (as mentioned in point 1) require me to restart the Host Monitor & Service Monitor roles?

 

HMON_SMON.JPG

 

 

Thanks

Wert

avatar
Super Collaborator

To answer your questions:

 

  1. The Time Series Storage limit can be configured but the minimum value is 10 Gb.
  2. Data is deleted from time series when this limit is reached.
  3. A change for this configuration parameter requires a restart of the role instance.

Please note that this is all about time series data which will be used to populate the charts in Cloudera Manager with data. This will not affect the log file size for the roles, e.g. in /var/log/cloudera-scm-firehose/ directory, as you mention /logs again in your recent post.

avatar
Expert Contributor

Hello Gzigldrum,

 

Thanks for your reply,my issue is that health status of Host Monitor Health, Event Server Health & Alert Publisher Health shows as Bad (Red) due to directory free space issue, is there anything I can check or change to fix this ? Will deleting the logs from the location you have mentioned in the previous location help us in reducing the alerts and bring us in Green zone.

 

Appreciate any help in this regard.

 

Regards

Wert

avatar
Super Collaborator
Please show the exact alert message, we need to know for which directory the free space warning is raised for.

Likely this is for the /var mountpoint, in that case please confirm on actual disk usage with commands

# df -h /var
# du -hs /var/*

avatar
Expert Contributor

Hi Gzigldrum,

 

Following is the details requested, unfortunately I am unablle to run (du -hs /var/*) seems I do not have required permissions.

 

Host Monitor Error

 

Host Monitor Storage Directory Free Space:

This role's Host Monitor Storage Directory is on a filesystem with less than 5.0 GiB of its space free. /var/lib/cloudera-host-monitor (free: 3.2 GiB (16.09%), capacity: 20.0 GiB)

***********************************

Event Server Error

 

 Log Directory Free Space :

This role's Log Directory is on a filesystem with less than 5.0 GiB of its space free. /var/log/cloudera-scm-eventserver (free: 4.9 GiB (35.25%), capacity: 14.0 GiB)

***********************************

Alert Publisher Error

 

Log Directory Free Space :

This role's Log Directory is on a filesystem with less than 10.0 GiB of its space free. /var/log/cloudera-scm-alertpublisher (free: 5.0 GiB (35.81%), capacity: 14.0 GiBCapture.JPG

 

 

 

 

 

 

avatar
Expert Contributor

Hello Wert,

 

As per the information provided, you have mentioned free space available.

 

16.09%(free: 3.2 GiB) of free space in /var/lib/cloudera-host-monitor.

 

35.25%(free: 4.9 GiB) of free space in /var/log/cloudera-scm-eventserver.

 

35.81%(free: 5.0 GiB) of free space in /var/log/cloudera-scm-alertpublisher.

 

Thus explaining the alert for low disk space.

 

The data in "/var/lib/cloudera-[host|service]-monitor" is the sum total of the working data for these respective services. Time-series metrics and health data - Time-Series Storage (firehose_time_series_storage_bytes - 10 GB default, 10 GB minimum)

 

My suggestions:

 

1.) Change the default directory("/var/lib/cloudera-[host|service]-monitor") to some other location in your environment with enough space.

 

>> Stop the Service(Service Monitor or Host Monitor).

>> Save your old data and then copy the current directory to the new directory(optional)(Only if you need the old data).

>> Update the Storage Directory configuration option (firehose.storage.base.directory) on the corresponding role configuration page.

>> Start the Service Monitor or Host Monitor.


2.) If the data available in "/var/lib/cloudera-host-monitor" is not of much importance you can remove the data manually. But it's not a recommended step.

 

Your Health statuses will be Unknown or Bad for a short time and you will lose all Charts in the UI while the timeseries store is rebuilt and repopulated (due to the fact that you are deleting ALL the historical metrics). But this shouldn't have an impact on any service.

 

3.) Either add more disk to the cluster or remove unused/unnecessary files available on the disk to free up some disk space.

 

Regards.

avatar
Expert Contributor
Hi Tony,

Thanks for your reply, appericate all the hep provided by you and Gzigldrum


Regards
Wert