Created 09-14-2016 02:54 PM
1. Why Secondary namenode is explicitly copying Fsimage from Primary name node when secondary name node is having the same copy of FS image as primary has?
2. Initially when cluster is setup will it be having any fsimage at primary node if yes will it contains any data
3. Looks like both primary name node and secondary name node are maintaining all the transaction logs? Is it required to maintain same logs in both locations? if yes, How many old transactions that we have to keep in cluster? is there any configuration for this
Created 09-16-2016 05:31 AM
1)Why Secondary namenode is explicitly copying Fsimage from Primary name node when secondary name node is having the same copy of FS image as primary has?
There is no guaranty that the fs image in secondary namenode will be exactly same as that in Primary namenode. During checkpoint period of time , there may happen any corruption of data or any crashes and data loss. Its better to get the latest available data from Primary namenode and then merge the editlogs.
2) Initially when cluster is setup will it be having any fsimage at primary node if yes will it contains any data.
Yes, When a new namenode is setup in a new cluster it will have a FSimage with no data in it with file name like Fsimage_000000000 representing no transactions.
3) Looks like both primary name node and secondary name node are maintaining all the transaction logs? Is it required to maintain same logs in both locations? if yes, How many old transactions that we have to keep in cluster? is there any configuration for this
By default HDFS stores till the transactions count reaches 1 million. Files which are storing transaction logs greater than 1 million are removed from HDFS.
Created 09-16-2016 05:31 AM
1)Why Secondary namenode is explicitly copying Fsimage from Primary name node when secondary name node is having the same copy of FS image as primary has?
There is no guaranty that the fs image in secondary namenode will be exactly same as that in Primary namenode. During checkpoint period of time , there may happen any corruption of data or any crashes and data loss. Its better to get the latest available data from Primary namenode and then merge the editlogs.
2) Initially when cluster is setup will it be having any fsimage at primary node if yes will it contains any data.
Yes, When a new namenode is setup in a new cluster it will have a FSimage with no data in it with file name like Fsimage_000000000 representing no transactions.
3) Looks like both primary name node and secondary name node are maintaining all the transaction logs? Is it required to maintain same logs in both locations? if yes, How many old transactions that we have to keep in cluster? is there any configuration for this
By default HDFS stores till the transactions count reaches 1 million. Files which are storing transaction logs greater than 1 million are removed from HDFS.