Created on 03-15-2019 11:05 AM - edited 03-15-2019 02:39 PM
In my cluster Event server is faild to start.When I restart it it shows the below error message :
---------
Process Status:
This role's process failed to start.
Cloudera Manager Descriptor Age
Not enough data to test: Test of whether the Cloudera Manager descriptor is up to date
Unknown : Not enough data to test: Test of whether the Cloudera Manager descriptor is up to date.
---------------
Someone sugesated to increase Java heap size of Event server 1 GB. I can see its already 1 GB.
Can you guys help me why its failed to start , what other things I can check??
Created 03-20-2019 06:35 PM
Hi @MantuDeka ,
Thanks for providing the log file. The error message indicates that the Event Server index has an corruption. However, the log snippet does not tell what caused it.
For resolution, there is an option to attempt to fix the index which involves some steps, however the index may not be recoverable.
Alternatively a quick fix would be to start the Event Server with a fresh new index. This will not cause any impact to cluster operation, but you will not be able to search for previous events in Cloudera Manager UI, CM -> Diagnostics -> Events page.
Here are the steps for quick fix:
1. Stop the Event Server role instance in CM
2. Backup the current data directory (you can find the value from CM UI and search for -> "Event Server Index Directory"), default value usually is /var/lib/cloudera-scm-eventserver/ (just for safety reasons, this backup is likely not needed afterwards)
3. Empty the data directory: # rm -rf /var/lib/cloudera-scm-eventserver/*
4. Start the Event Server role instance in CM
5. Monitor the Event Server role logs in /var/log/cloudera-scm-eventserver/ directory, it should confirm the process is able to start up and operate
Thanks,
Li
Li Wang, Technical Solution Manager
Created 03-19-2019 04:12 AM
Created 03-19-2019 08:51 AM
Please check for errors in the Event Server logs and stderr/stdout logs. Also check /var/log/messages if it may have been killed by the kernel oom killer.
Created on 03-20-2019 04:56 PM - edited 03-20-2019 04:57 PM
Hi gzigldrum , lwang
Below are the Event Server error logs-
Error starting EventServer java.io.IOException: read past EOF at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:207) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:40) at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:71) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:260) at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:168) at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1155) at com.cloudera.cmf.eventcatcher.server.SingleIndexManager.makeIndexWriter(SingleIndexManager.java:139) at com.cloudera.cmf.eventcatcher.server.SingleIndexManager.<init>(SingleIndexManager.java:112) at com.cloudera.cmf.eventcatcher.server.EventCatcherService.<init>(EventCatcherService.java:282) at com.cloudera.cmf.eventcatcher.server.EventCatcherService.main(EventCatcherService.java:148) |
Created 03-20-2019 06:35 PM
Hi @MantuDeka ,
Thanks for providing the log file. The error message indicates that the Event Server index has an corruption. However, the log snippet does not tell what caused it.
For resolution, there is an option to attempt to fix the index which involves some steps, however the index may not be recoverable.
Alternatively a quick fix would be to start the Event Server with a fresh new index. This will not cause any impact to cluster operation, but you will not be able to search for previous events in Cloudera Manager UI, CM -> Diagnostics -> Events page.
Here are the steps for quick fix:
1. Stop the Event Server role instance in CM
2. Backup the current data directory (you can find the value from CM UI and search for -> "Event Server Index Directory"), default value usually is /var/lib/cloudera-scm-eventserver/ (just for safety reasons, this backup is likely not needed afterwards)
3. Empty the data directory: # rm -rf /var/lib/cloudera-scm-eventserver/*
4. Start the Event Server role instance in CM
5. Monitor the Event Server role logs in /var/log/cloudera-scm-eventserver/ directory, it should confirm the process is able to start up and operate
Thanks,
Li
Li Wang, Technical Solution Manager
Created on 03-22-2019 11:29 AM - edited 03-22-2019 11:36 AM
Thanks lwang for your response .
Is this the reason we are not getting health check alert over email?? Even Report manager is also failed to start. Same issue.
Below is the Health Test for both..
WE are getting the eror while tring to restart report manager :
A server error has occurred. Send the following information to Cloudera.
Created 03-25-2019 11:10 AM
Hi @MantuDeka ,
Yes, if event server is not working, you won't be able to receive alerts. The alerts basically are the events which are marked as alerts based on CM configuration checking.
About Report Manager, we need to check the RM process logs and role log for more details about what went wrong. By default:
1) On the Report Manager host, find out the process logs:
/var/run/cloudera-scm-agent/process/<process-ID>-cloudera-mgmt-REPORTSMANAGER/logs/stderr and stdout.log
2) On the same host, find out the role log:
/var/log/cloudera-scm-headlamp/mgmt-cmf-mgmt-REPORTSMANAGER-<hostname>.log.out
Thanks and hope it helps,
Li
Li Wang, Technical Solution Manager
Created 06-14-2019 04:39 AM
Created 06-14-2019 03:34 PM
Li Wang, Technical Solution Manager
Created on 03-19-2019 03:42 PM - edited 03-19-2019 03:43 PM
Hi @MantuDeka ,
To second @gzigldrum 's feedback, you can find out more information from some logs. By default:
1) On the Event Server host, find out the process logs:
/var/run/cloudera-scm-agent/process/<process-ID>-cloudera-mgmt-EVENTSERVER/logs/stderr and stdout.log
2) On the same host, find out the role log:
/var/log/cloudera-scm-eventserver/mgmt-cmf-mgmt-EVENTSERVER-<hostname>.log.out
Thanks and hope it helps,
Li
Li Wang, Technical Solution Manager