Support Questions

Find answers, ask questions, and share your expertise

Rebooting steps for secondary namenode

avatar
Explorer

Hello Guys,

We are planning to reboot our secondary name node.  Below is our hdfs-site.xml file. Please let me know best step by step procedure to reboot the secondary namenode. Do we have to run "hdfs secondarynamenode  -checkpoint " after the reboot or need to check uncheckpointed transactions before reboot. thanks in advance for your help

=========

<configuration>
<property>
  <name>dfs.datanode.max.xcievers</name>
  <value>4096</value>
</property>

<property>
  <name>dfs.data.dir</name>
  <value>/mnt/scecondary/dfs-data</value>
</property>

<property>
    <name>dfs.datanode.socket.write.timeout</name>
    <value>0</value>
</property><property>
    <name>fs.checkpoint.period</name>
    <value>1800</value>
</property>
</configuration>
=================

2 ACCEPTED SOLUTIONS

avatar
Mentor
All you need to do is start it up again, and ensure its process comes up.
The SNN will invoke a checkpoint on its own if it determines that necessary
during the startup, automatically. There should be no need to pass a
specific flag for this.

Your configs seem related to DNs instead of SNN.

View solution in original post

avatar
Mentor
Could you clarify - how can you confirm what?

If you meant to ask how to confirm if SNN works OK, you could check its log (it does log in INFO for every time it does a checkpoint successfully), or depending on the version, check its Web UI (default port 50090, IIRC).

As to the load point - what process is consuming the said load? Is it the SNN, or something else?

Per your process output though, what version of CDH are you using? Is it CDH3?

View solution in original post

7 REPLIES 7

avatar
Mentor
All you need to do is start it up again, and ensure its process comes up.
The SNN will invoke a checkpoint on its own if it determines that necessary
during the startup, automatically. There should be no need to pass a
specific flag for this.

Your configs seem related to DNs instead of SNN.

avatar
Explorer

Hello Harsh,

 

The issue is that this server is having high load everytime. The configuration seems to be like DN as you said, but the dfsadmin report is not showing this server.  

 

The jps is showing as 

=======

# jps
8014 SecondaryNameNode
22290 Jps

=======

SNN process is runnnig.

 

hdfs 8014 7.6 3.8 1427044 149836 ? Sl 2013 90941:12 java -Dproc_secondarynamenode -Xmx1000m -Dhadoop.log.dir=/usr/lib/hadoop-0.20/logs -Dhadoop.log.file=hadoop-hadoop-secondarynamenode

=======

 

How can I confirm this ?

 

Thanks 

 

avatar
Mentor
Could you clarify - how can you confirm what?

If you meant to ask how to confirm if SNN works OK, you could check its log (it does log in INFO for every time it does a checkpoint successfully), or depending on the version, check its Web UI (default port 50090, IIRC).

As to the load point - what process is consuming the said load? Is it the SNN, or something else?

Per your process output though, what version of CDH are you using? Is it CDH3?

avatar

Hi Harsh,
I just observed that in one of the clusters where HA is not enabled, there has been no checkpointing right from the start.
Can you confirm why this behaviour?

The graph is like a linear line(never came down) for 'Transactions since last log checkpoint'

 

avatar
Mentor
You should checkout the Standby NameNode's log for checkpoint related
messages to ascertain the issue - that's the daemon responsible for
triggering a checkpoint, performing it and uploading it back into the
Active NameNode, much like Secondary NameNode in a cluster without HA.

Please open up a new topic as it would be unrelated to this one.

avatar

Hi Harsh,

 

The Error is
Exception in doCheckpoint
java.io.exception:Inconsistent checkpoint fields

There is mismatch between namespaceID and blockpoolID
The LV,Ctime and clusterID are matching

I will create this in another thread as well but for urgent solution can you please reply here?

Parallely creating new thread now

 

avatar

Hi Harsh, I have created a new thread with topic
Exception in doCheckpoint