Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Event Server Restarts Continuously After Upgrade from CDH5.3.2 to CDH5.4.2

avatar
Explorer

As part of upgrade our cluster to CDH5.4.2, we had to upgrade the Cloudera Manager to 5.4.1 so that the Impala Daemon

metrics could be properly retrieved. After the upgrade to CM5.4.1, the Event Server health check reports two bad checks: "Process Status"

and "Unexpected Exits".  I looked into the event server logs (w/ the default log level set to INFO) and noticed that the Event Server constantly

restarts itself, hence, the the two aformentioned bad checks. I looked through the logs and found no obvious indication of errors that might

explain the issue. Searching through online forums and this board, I found some people with the same issue, but either had no resolution or

their problem was slightly different. Anybody else having this same problem as I just described. This is blocking completion of the CDH5.4.2 upgrade

for us. I'll be happy to share more info of our setup if I knew where else to look other than the logs and the CM Service page.

 

Thanks,

-Terry

 

1 ACCEPTED SOLUTION

avatar
Hi Terry,

Constant restarts usually happens if the role is hitting out of memory errors constantly. There'd usually be a note in the standard error output (NOT the role logs) that will indicate there was an OOM. Can you share the stderr of the event server?

You should also make sure that the heap dumps from each OOM error aren't filling up your disks (assuming heap dumps are enabled in your configuration). You can see the directory where heap dumps are placed by looking at your Event Server configuration.

Thanks,
Darren

View solution in original post

4 REPLIES 4

avatar
Hi Terry,

Constant restarts usually happens if the role is hitting out of memory errors constantly. There'd usually be a note in the standard error output (NOT the role logs) that will indicate there was an OOM. Can you share the stderr of the event server?

You should also make sure that the heap dumps from each OOM error aren't filling up your disks (assuming heap dumps are enabled in your configuration). You can see the directory where heap dumps are placed by looking at your Event Server configuration.

Thanks,
Darren

avatar
Explorer

Hi Darren,

 

Thanks for responding. Yes, I noticed that the heap size assigned to the EventServer had been set to an extremely low value (128MB) compared to the

default of 1GB. I bumped it up to 512MB and no longer saw the restarts. Not sure how/who in our organization had set to such a low amount, but the

only logs which I know of were the ones under /var/log/cloudera-scm-eventserver and they were only the ones from stdout. These logs showed no

OOM or any error, for that matter, which led to confusion over what was going on. Is there a setting/log config which I'm not aware where I can force

stderr to be outputted to a file in the same directory as above? Also, thanks for the additional info with regards to the heap dump. This will all be useful

if we ever encounter any future startup issues.

 

-Terry

avatar
Hi Terry,

If you click on any role, then the processes tab, you can see the stderr and stdout for that process. This is usually the easiest way to see stderr / stdout.

These are never located in /var/log/ on the machine (that is where role logs usually go). stderr and stdout are in /var/run/cloudera-scm-agent/process/<directory corresponding to your command or process>/logs/

Thanks,
Darren

avatar
Explorer

Hi Darren,

 

Awesome! Good to know. Thanks for your help in all this!

 

-Terry