Support Questions

Find answers, ask questions, and share your expertise

Service Monitor doesn't start

avatar
Explorer

Few Days ago I accidentally deleted the some content of the /var/lib/cloudera-service-monitor/ts directory where monitor data are stored. Since then I'm not able to restart service monitor because different exceptions are generated. This is the last one

 

Failed to start Firehose

java.lang.RuntimeException: com.cloudera.cmon.tstore.leveldb.LDBPartitionManager$LDBPartitionException: Unable to open DB in directory /var/lib/cloudera-service-monitor/ts/stream/partitions/stream_2015-07-10T07:22:29.111Z for partition LDBPartitionMetadataWrapper{tableName=stream, partitionName=stream_2015-07-10T07:22:29.111Z, startTime=2015-07-10T07:22:29.111Z, endTime=null, version=2, state=CLOSED}

at com.cloudera.cmon.tstore.leveldb.LDBPartitionManager.getPartition(LDBPartitionManager.java:722)

at com.cloudera.cmon.tstore.leveldb.LDBPartitionUtils.forPartition(LDBPartitionUtils.java:70)

at com.cloudera.cmon.tstore.leveldb.LDBPartitionUtils.writeForPartition(LDBPartitionUtils.java:45)

at com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesStreamTable.write(LDBTimeSeriesStreamTable.java:118)

at com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesStreamTable.write(LDBTimeSeriesStreamTable.java:107)

at com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesStore.write(LDBTimeSeriesStore.java:236)

at com.cloudera.cmon.tstore.AggregatingTimeSeriesStore.write(AggregatingTimeSeriesStore.java:219)

at com.cloudera.cmon.kaiser.TimeSeriesHelper.insertInternalMetrics(TimeSeriesHelper.java:194)

at com.cloudera.cmon.firehose.Firehose.insertStartupMetrics(Firehose.java:518)

at com.cloudera.cmon.firehose.Firehose.<init>(Firehose.java:310)

at com.cloudera.cmon.firehose.Main.main(Main.java:527)

Caused by: com.cloudera.cmon.tstore.leveldb.LDBPartitionManager$LDBPartitionException: Unable to open DB in directory /var/lib/cloudera-service-monitor/ts/stream/partitions/stream_2015-07-10T07:22:29.111Z for partition LDBPartitionMetadataWrapper{tableName=stream, partitionName=stream_2015-07-10T07:22:29.111Z, startTime=2015-07-10T07:22:29.111Z, endTime=null, version=2, state=CLOSED}

at com.cloudera.cmon.tstore.leveldb.LDBUtils.openOrCreatePartitionDB(LDBUtils.java:195)

at com.cloudera.cmon.tstore.leveldb.LDBPartitionManager.getOrOpenInternal(LDBPartitionManager.java:616)

at com.cloudera.cmon.tstore.leveldb.LDBPartitionManager.openOrCreatePartitionLDB(LDBPartitionManager.java:557)

at com.cloudera.cmon.tstore.leveldb.LDBPartitionManager.getPartition(LDBPartitionManager.java:451)

at com.cloudera.cmon.tstore.leveldb.LDBPartitionManager.getPartition(LDBPartitionManager.java:713)

... 10 more

Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: Invalid argument: /var/lib/cloudera-service-monitor/ts/stream/partitions/stream_2015-07-10T07:22:29.111Z: does not exist (create_if_missing is false)

at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:194)

at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:212)

at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)

at com.cloudera.cmon.tstore.leveldb.LDBUtils.openOrCreatePartitionDB(LDBUtils.java:185)

... 14 more

 

 

When  I check on filesystem the directory exist but I'm not able to solve the problem. 

 

Can anyone help me

 

Thanks in advance

 

1 ACCEPTED SOLUTION

avatar
Master Collaborator

This could be either permissions issue under /var/lib/cloudera-service-monitor or corrupted LevelDB data.

Workaround, if you don't intend to scroll back to past service events, and would like to start SMON you can re-initilaise SMON LevelDB location.

 

1. Stop Service Monitor

2. [bash]$ mv /var/lib/cloudera-service-monitor /var/lib/cloudera-service-monitor.moved

3. Start SMON, this will initialise your Service Monitor LevelDB/ts data

 

Awaiting your feedback if this helps.

 

Michalis

View solution in original post

4 REPLIES 4

avatar
Master Collaborator

This could be either permissions issue under /var/lib/cloudera-service-monitor or corrupted LevelDB data.

Workaround, if you don't intend to scroll back to past service events, and would like to start SMON you can re-initilaise SMON LevelDB location.

 

1. Stop Service Monitor

2. [bash]$ mv /var/lib/cloudera-service-monitor /var/lib/cloudera-service-monitor.moved

3. Start SMON, this will initialise your Service Monitor LevelDB/ts data

 

Awaiting your feedback if this helps.

 

Michalis

avatar
Explorer

Yeah! It works thank you very much

avatar

Hi, just wanted to add that I had a similar problem with the service monitor and after moving the old directory it started.  The only significant thing I have to add is that I did not see any "error" labels in the start up error log file.

 

Thanks!!

avatar
Contributor

Change the directories below for Service Monitor since the procedure is the same as for the Host Monitor.

 

You can salvage the contents of the Host Monitor by using the LDBStoreTool Java Class to repair the corrupted LDB:

 

  1. Make sure the Host Monitor is stopped completely (it should be since it is unable to open this LDB).
  2. Backup the /var/lib/cloudera-host-monitor directory with tar or cp.
  3. Run the LDBStoreTool Java class to try and bring the corrupt database to a consistent state (please adjust the directory to the one reported in the exception):
    java -cp "/usr/share/cmf/lib/*" com.cloudera.cmon.tstore.leveldb.tool.LDBStoreTool repair --directory /var/lib/cloudera-host-monitor/subject_record/subject_ts/partitions/subject_ts_2017-10-30T18:03:04.415Z
    [ main] log INFO Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
    [ main] CMONConfiguration INFO Config: jar:file:/usr/share/cmf/common_jars/firehose-5.12.1.jar!/cmon.conf
    [ main] ConfigUtil WARN Could not find configuration file cmon-cm-auth.conf
    [ main] LDBResourceManager INFO Max file descriptors: 4096
    [ main] LDBResourceManager INFO Setting maximum open fds to: 2048
    Running repair command
    Success
     
  4. Start the Host Monitor and it should start now.

 

If the LDBStoreTool Java class is unable to repair the corrupt LDB then you will have to purge the /var/lib/cloudera-host-monitor directory similar to steps noted above by Michalis.