Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Why Secondary namenode is explicitly copying FSimage from Primary name node

avatar
Rising Star

1. Why Secondary namenode is explicitly copying Fsimage from Primary name node when secondary name node is having the same copy of FS image as primary has?

2. Initially when cluster is setup will it be having any fsimage at primary node if yes will it contains any data

3. Looks like both primary name node and secondary name node are maintaining all the transaction logs? Is it required to maintain same logs in both locations? if yes, How many old transactions that we have to keep in cluster? is there any configuration for this

1 ACCEPTED SOLUTION

avatar
Expert Contributor

1)Why Secondary namenode is explicitly copying Fsimage from Primary name node when secondary name node is having the same copy of FS image as primary has?

There is no guaranty that the fs image in secondary namenode will be exactly same as that in Primary namenode. During checkpoint period of time , there may happen any corruption of data or any crashes and data loss. Its better to get the latest available data from Primary namenode and then merge the editlogs.

2) Initially when cluster is setup will it be having any fsimage at primary node if yes will it contains any data.

Yes, When a new namenode is setup in a new cluster it will have a FSimage with no data in it with file name like Fsimage_000000000 representing no transactions.

3) Looks like both primary name node and secondary name node are maintaining all the transaction logs? Is it required to maintain same logs in both locations? if yes, How many old transactions that we have to keep in cluster? is there any configuration for this

By default HDFS stores till the transactions count reaches 1 million. Files which are storing transaction logs greater than 1 million are removed from HDFS.

View solution in original post

1 REPLY 1

avatar
Expert Contributor

1)Why Secondary namenode is explicitly copying Fsimage from Primary name node when secondary name node is having the same copy of FS image as primary has?

There is no guaranty that the fs image in secondary namenode will be exactly same as that in Primary namenode. During checkpoint period of time , there may happen any corruption of data or any crashes and data loss. Its better to get the latest available data from Primary namenode and then merge the editlogs.

2) Initially when cluster is setup will it be having any fsimage at primary node if yes will it contains any data.

Yes, When a new namenode is setup in a new cluster it will have a FSimage with no data in it with file name like Fsimage_000000000 representing no transactions.

3) Looks like both primary name node and secondary name node are maintaining all the transaction logs? Is it required to maintain same logs in both locations? if yes, How many old transactions that we have to keep in cluster? is there any configuration for this

By default HDFS stores till the transactions count reaches 1 million. Files which are storing transaction logs greater than 1 million are removed from HDFS.