Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Could I delete files in /var/lib?

avatar
Contributor

I'm running a one-note cdh4 system for test.

 

It's found that the size of /var/lib increases quickly. Could I delete those files?

1 ACCEPTED SOLUTION

avatar
Super Collaborator

In the grand scheme, you should generally never, ever directly delete files on-disk that pertain to a database's data-files. It is advised to instead determine what application or process is writing to the database, and inserting that data. The appropriate place to address this is through the application that is writing them, not by deleting on-disk, or even by logging into postgres with psql and manually performing any activity.

 

In this case, I am fairly sure your Cloudera Manager deployment is using postgres running on port 7432, to house these databases:

Cloudera Manager

Activity Monitor

Service Monitor

Host Monitor

Reports Manager

 

None of these should be manually altered by logging into the database.

 

This is relevant for Cloudera Manager 4.x only:

Cause: 
Cloudera Manager's Management Services use various databases to store their gathered data. These databases should be located where sufficient space is available to accommodate their growth. Without proper consideration, their default locations could be in a location with insufficient space and a volume could fill to 100%.
Instructions
The rate of growth for these databases is solely controlled via purge/expiration configuration in each of the services (Activity Monitor, Service Monitor, Host Monitor) individually. This controls how many hours/days worth of monitoring data are kept, and will directly then influence how large the databases will grow. Use these settings to apply bound to their growth:

1. Host Monitor:
        * Host Monitor Data Expiration Period (hours) (default 168 hours, or 7 days)

2. Service Monitor:
        * Service Monitor Data Expiration Period (hours) (default 168 hours, or 7 days)

3. Activity Monitor:
        * Purge Activities Data at This Age (default 336 hours, or 14 days)
        * Purge Attempts Data at This Age (default 336 hours, or 14 days)
        * Purge MapReduce Service Data at This Age (default 336 hours, or 14 days)

You can adjust these values downward, save them, and then in the background services will expire data older than the new value you set.

For example, if you are using the cloudera-scm-server-db ("embedded postgres") database for Cloudera Manager, all these databases will consume space in the default path /var/lib/cloudera-scm-server-db/data. You may wish to halve the retention periods for all the above services so this location does not fill up.

This is also discussed briefly under the header titled "Configuring Cloudera Management Services Database Limits" in our documentation:http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.5.4/Cloudera-Manager-Enterpr...

 

 

TL;DR:

Please find and adjust the specific Cloudera Manager Management Services purging | expiration tunables that assist in controlling the size of the databases on-disk, or allocate more space to the partition where /var resides.

 

 

View solution in original post

17 REPLIES 17

avatar
Contributor

I have tried to figure out what is that oid maps to which database. However, I can't enter the PostgreSQL console.

 

When I run "psql" in the terminal, it return

 

psql: could not connect to server: No such file or directory
    Is the server running locally and accepting
    connections on Unix domain socket "/tmp/.s.PGSQL.5432"?

 

It seems that the embeded PostgreSQL is different from the native one, so all the solutions I found on the web don't work.

 

Any suggestion for this problem?

 

 

Here is some info that maybe helpful.

When I run "ps -elf | grep postmaster", it return

0 S 101      30238     1  0  75   0 - 540004 -     Jun19 ?        00:00:01 /usr/bin/postmaster -D /var/lib/cloudera-scm-server-db/data

 

When I run "ps -elf | grep post", it return

 

1 S 101       1255 30238  0  76   0 - 540765 -     10:08 ?        00:00:08 postgres: smon smon 192.168.28.40(33310) idle              
1 S 101       1256 30238  0  76   0 - 540749 -     10:08 ?        00:00:08 postgres: smon smon 192.168.28.40(33311) idle              
1 S 101       2085 30238  0  75   0 - 540765 -     10:23 ?        00:00:01 postgres: hmon hmon 192.168.28.40(33932) idle              
1 S 101       2086 30238  0  76   0 - 540772 -     10:23 ?        00:00:01 postgres: hmon hmon 192.168.28.40(33933) idle              
1 S 101       2932 30238  0  75   0 - 540765 -     10:38 ?        00:00:00 postgres: hmon hmon 192.168.28.40(58484) idle              
1 S 101       2935 30238  0  75   0 - 540749 -     10:38 ?        00:00:05 postgres: smon smon 192.168.28.40(58494) idle              
1 S 101       2936 30238  0  76   0 - 540749 -     10:38 ?        00:00:00 postgres: amon amon 192.168.28.40(58496) idle              
1 S 101       2937 30238  0  76   0 - 540749 -     10:38 ?        00:00:00 postgres: amon amon 192.168.28.40(58497) idle              
1 S 101       3804 30238  0  76   0 - 540749 -     10:53 ?        00:00:00 postgres: smon smon 192.168.28.40(33735) idle              
1 S 101       3806 30238  0  75   0 - 540772 -     10:53 ?        00:00:04 postgres: smon smon 192.168.28.40(33736) idle              
1 S 101       3807 30238  0  75   0 - 540778 -     10:53 ?        00:00:04 postgres: smon smon 192.168.28.40(33737) idle              
1 S 101       3808 30238  0  75   0 - 540765 -     10:53 ?        00:00:00 postgres: amon amon 192.168.28.40(33739) idle              
1 S 101       3809 30238  0  76   0 - 540749 -     10:53 ?        00:00:00 postgres: amon amon 192.168.28.40(33740) idle              
1 S 101       4718 30238  0  76   0 - 540749 -     11:08 ?        00:00:00 postgres: hmon hmon 192.168.28.40(38106) idle              
1 S 101       4719 30238  0  75   0 - 540749 -     11:08 ?        00:00:00 postgres: hmon hmon 192.168.28.40(38107) idle              
1 S 101       4720 30238  0  75   0 - 540765 -     11:08 ?        00:00:00 postgres: hmon hmon 192.168.28.40(38108) idle              
1 S 101       4722 30238  0  76   0 - 540765 -     11:08 ?        00:00:08 postgres: smon smon 192.168.28.40(38117) idle              
1 S 101       4723 30238  0  75   0 - 540772 -     11:08 ?        00:00:07 postgres: smon smon 192.168.28.40(38118) idle              
1 S 101       4724 30238  0  76   0 - 540749 -     11:08 ?        00:00:00 postgres: amon amon 192.168.28.40(38120) idle              
1 S 101       4725 30238  0  75   0 - 540749 -     11:08 ?        00:00:00 postgres: amon amon 192.168.28.40(38121) idle              
1 S 101       5637 30238  0  75   0 - 540765 -     11:23 ?        00:00:00 postgres: hmon hmon 192.168.28.40(33133) idle              
1 S 101       5638 30238  0  75   0 - 540660 -     11:23 ?        00:00:00 postgres: hmon hmon 192.168.28.40(33134) idle              
1 S 101       5639 30238  0  75   0 - 540749 -     11:23 ?        00:00:00 postgres: hmon hmon 192.168.28.40(33135) idle              
1 S 101       5641 30238  0  75   0 - 540749 -     11:23 ?        00:00:00 postgres: amon amon 192.168.28.40(33144) idle              
1 S 101       5642 30238  0  76   0 - 540749 -     11:23 ?        00:00:00 postgres: amon amon 192.168.28.40(33145) idle              
1 S 101       6789 30238  0  75   0 - 540749 -     11:38 ?        00:00:00 postgres: hmon hmon 192.168.28.40(45206) idle              
1 S 101       6791 30238  0  75   0 - 540749 -     11:38 ?        00:00:00 postgres: smon smon 192.168.28.40(45215) idle              
1 S 101       6792 30238  0  75   0 - 540749 -     11:38 ?        00:00:00 postgres: smon smon 192.168.28.40(45216) idle              
1 S 101       6793 30238  0  76   0 - 540749 -     11:38 ?        00:00:00 postgres: amon amon 192.168.28.40(45218) idle              
1 S 101       6794 30238  0  76   0 - 540749 -     11:38 ?        00:00:00 postgres: amon amon 192.168.28.40(45219) idle              
4 S root      6938  4666  0  78   0 - 15306 pipe_w 11:40 pts/0    00:00:00 grep post
1 S 101      19728 30238  0  75   0 - 540939 -     Jun19 ?        00:00:00 postgres: hive hive 192.168.28.40(55990) idle              
1 S 101      19730 30238  0  75   0 - 540642 -     Jun19 ?        00:00:00 postgres: hive hive 192.168.28.40(55993) idle              
4 S 101      21214  7449  0  80   0 - 204323 184466 Jun19 ?       00:12:37 /usr/java/jdk1.6.0_31/bin/java -server -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled -XX:+UseParNewGC -Dmgmt.log.file=mgmt-cmf-mgmt1-HOSTMONITOR-master.log.out -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -Dfirehose.schema.dir=/usr/share/cmf/schema -Xms274384019 -Xmx274384019 -cp /var/run/cloudera-scm-agent/process/35-cloudera-mgmt-HOSTMONITOR:/usr/share/java/mysql-connector-java.jar:/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/cmf/lib/* com.cloudera.cmon.firehose.Main --pipeline-type HOST_MONITORING --mgmt-home /usr/share/cmf
4 S 101      21230  7449  0  82   0 - 181839 futex_ Jun19 ?       00:07:19 /usr/java/jdk1.6.0_31/bin/java -server -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled -XX:+UseParNewGC -Dmgmt.log.file=mgmt-cmf-mgmt1-EVENTSERVER-master.log.out -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -Xms274384019 -Xmx274384019 -cp /var/run/cloudera-scm-agent/process/36-cloudera-mgmt-EVENTSERVER:/usr/share/java/mysql-connector-java.jar:/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/cmf/lib/* com.cloudera.cmf.eventcatcher.server.EventCatcherService
4 S 101      21258  7449  0  80   0 - 182312 futex_ Jun19 ?       00:08:11 /usr/java/jdk1.6.0_31/bin/java -server -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled -XX:+UseParNewGC -Dmgmt.log.file=mgmt-cmf-mgmt1-ACTIVITYMONITOR-master.log.out -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -Dfirehose.schema.dir=/usr/share/cmf/schema -Xms274384019 -Xmx274384019 -cp /var/run/cloudera-scm-agent/process/37-cloudera-mgmt-ACTIVITYMONITOR:/usr/share/java/mysql-connector-java.jar:/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/cmf/lib/* com.cloudera.cmon.firehose.Main --pipeline-type ACTIVITY_MONITORING_TREE --mgmt-home /usr/share/cmf
4 S 101      21280  7449  0  83   0 - 186964 184466 Jun19 ?       00:01:22 /usr/java/jdk1.6.0_31/bin/java -server -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled -XX:+UseParNewGC -Dmgmt.log.file=mgmt-cmf-mgmt1-ALERTPUBLISHER-master.log.out -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -Xms268435456 -Xmx268435456 -cp /var/run/cloudera-scm-agent/process/38-cloudera-mgmt-ALERTPUBLISHER:/usr/share/java/mysql-connector-java.jar:/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/cmf/lib/* com.cloudera.enterprise.alertpublisher.AlertPublisher
4 S 101      21296  7449  2  80   0 - 214066 184466 Jun19 ?       00:39:14 /usr/java/jdk1.6.0_31/bin/java -server -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled -XX:+UseParNewGC -Dmgmt.log.file=mgmt-cmf-mgmt1-SERVICEMONITOR-master.log.out -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -Dfirehose.schema.dir=/usr/share/cmf/schema -XX:PermSize=128m -Dsun.rmi.transport.tcp.handshakeTimeout=10000 -Dsun.rmi.transport.tcp.responseTimeout=10000 -Xms274384019 -Xmx274384019 -cp /var/run/cloudera-scm-agent/process/39-cloudera-mgmt-SERVICEMONITOR:/usr/share/java/mysql-connector-java.jar:/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/cmf/lib/* com.cloudera.cmon.firehose.Main --pipeline-type SERVICE_MONITORING --mgmt-home /usr/share/cmf
1 S 101      22524 30238  0  75   0 - 541013 -     Jun19 ?        00:00:44 postgres: scm scm 127.0.0.1(43782) idle                    
1 S 101      22525 30238  0  75   0 - 540876 -     Jun19 ?        00:00:19 postgres: scm scm 127.0.0.1(43783) idle                    
1 S 101      22526 30238  0  75   0 - 541243 -     Jun19 ?        00:00:39 postgres: scm scm 127.0.0.1(43784) idle                    
0 S 101      30238     1  0  75   0 - 540004 -     Jun19 ?        00:00:01 /usr/bin/postmaster -D /var/lib/cloudera-scm-server-db/data
1 S 101      30243 30238  0  75   0 - 27484 -      Jun19 ?        00:00:00 postgres: logger process                                   
1 S 101      30246 30238  0  75   0 - 540282 -     Jun19 ?        00:00:19 postgres: writer process                                   
1 S 101      30247 30238  0  75   0 - 28251 -      Jun19 ?        00:00:00 postgres: stats buffer process                             
1 S 101      30248 30247  0  75   0 - 28169 -      Jun19 ?        00:00:00 postgres: stats collector process                          
1 S 101      30291 30238  0  76   0 - 541311 -     Jun19 ?        00:00:21 postgres: scm scm 127.0.0.1(39839) idle                    
1 S 101      30292 30238  0  76   0 - 541536 -     Jun19 ?        00:00:25 postgres: scm scm 127.0.0.1(39840) idle                    
1 S 101      30293 30238  0  76   0 - 541180 -     Jun19 ?        00:00:06 postgres: scm scm 127.0.0.1(39841) idle                    
1 S 101      30294 30238  0  76   0 - 541260 -     Jun19 ?        00:00:20 postgres: scm scm 127.0.0.1(39842) idle                    
1 S 101      30295 30238  0  75   0 - 541407 -     Jun19 ?        00:00:25 postgres: scm scm 127.0.0.1(39843) idle

avatar
Contributor

Finally I found the solution in cloudera docunment !!!

 

I run "psql -U smon -p7432" and enter the console.

 

In the large-size directory /var/lib/cloudera-scm-server-db/data/base/16387, there are 872 files.

The largest files are 24M, and there are about 150 such files.

 

Here are relname of these oids.

 

         relname          |  oid  
--------------------------+-------
 cmon_ll_dp_2014_06_20_11 | 37437

 cmon_ll_dp_2014_06_20_10 | 37415

 cmon_ll_dp_2014_06_20_09 | 37393

 

Any idea about what it is? Thanks so much for your help !

avatar
Super Collaborator

In the grand scheme, you should generally never, ever directly delete files on-disk that pertain to a database's data-files. It is advised to instead determine what application or process is writing to the database, and inserting that data. The appropriate place to address this is through the application that is writing them, not by deleting on-disk, or even by logging into postgres with psql and manually performing any activity.

 

In this case, I am fairly sure your Cloudera Manager deployment is using postgres running on port 7432, to house these databases:

Cloudera Manager

Activity Monitor

Service Monitor

Host Monitor

Reports Manager

 

None of these should be manually altered by logging into the database.

 

This is relevant for Cloudera Manager 4.x only:

Cause: 
Cloudera Manager's Management Services use various databases to store their gathered data. These databases should be located where sufficient space is available to accommodate their growth. Without proper consideration, their default locations could be in a location with insufficient space and a volume could fill to 100%.
Instructions
The rate of growth for these databases is solely controlled via purge/expiration configuration in each of the services (Activity Monitor, Service Monitor, Host Monitor) individually. This controls how many hours/days worth of monitoring data are kept, and will directly then influence how large the databases will grow. Use these settings to apply bound to their growth:

1. Host Monitor:
        * Host Monitor Data Expiration Period (hours) (default 168 hours, or 7 days)

2. Service Monitor:
        * Service Monitor Data Expiration Period (hours) (default 168 hours, or 7 days)

3. Activity Monitor:
        * Purge Activities Data at This Age (default 336 hours, or 14 days)
        * Purge Attempts Data at This Age (default 336 hours, or 14 days)
        * Purge MapReduce Service Data at This Age (default 336 hours, or 14 days)

You can adjust these values downward, save them, and then in the background services will expire data older than the new value you set.

For example, if you are using the cloudera-scm-server-db ("embedded postgres") database for Cloudera Manager, all these databases will consume space in the default path /var/lib/cloudera-scm-server-db/data. You may wish to halve the retention periods for all the above services so this location does not fill up.

This is also discussed briefly under the header titled "Configuring Cloudera Management Services Database Limits" in our documentation:http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.5.4/Cloudera-Manager-Enterpr...

 

 

TL;DR:

Please find and adjust the specific Cloudera Manager Management Services purging | expiration tunables that assist in controlling the size of the databases on-disk, or allocate more space to the partition where /var resides.

 

 

avatar
Contributor
@smark, thanks so much for your suggestion!
After adjusting those parameters and restart Cloudera Manager service, the size of that directory comes to 1G.

avatar
New Contributor

Can we delete files/directory in /var/lib/cloudera-scm-server/commands ??

avatar
Master Guru

@pchauras

Yes, you can but don't delete latest one from this dir because you will not be able to see recent commands in CM Web UI.

 

This command dir can be controlled by CM > Administration > Setting > Search for "Command Eviction Age"

And set it to some lower value like 90 days. 

 

To know more follow : https://www.cloudera.com/documentation/enterprise/5-5-x/topics/cm_ig_reqs_space.html#concept_tjd_4yc...


Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
New Contributor

After changing value of "Command Eviction Age" , does it requires restart of any service?

avatar
Master Guru

@shravani CM server restart required. 

systemctl restart cloudera-scm-server 

  


Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.