I'm running CloudBreak through some testing, and came across a situation I'm not quite sure how to solve. We're looking to use Cloudbreak to enable us to stop clusters when no activity is expected (i.e. evenings, weekends, etc). I've noticed that after stopping a cluster for an extended period of time, that the "NameNode Last Checkpoint" alert is being thrown.
I'm not sure of the expected behavior (Checkpoint on cluster startup?). Any one else is a similar situation?
This is expected behavior if your cluster is stop for extended period of time. When you start the cluster back the last checkpoint becomes very old respective to current time on the cluster server hence it throws that error alert
Note: Please upvote and accept this answer if you found it useful
Right, that makes sense. What I don't understand is why a checkpoint wouldn't immediately be taken on startup, since it is well past the HDFS Maximum Checkpoint Delay.