Support Questions

Find answers, ask questions, and share your expertise

CDH 5.5 namenode edit logs do not get automatically deleted, how can this be fixed ?

avatar
 
1 ACCEPTED SOLUTION

avatar
Expert Contributor


edits file
: An edits file contains a log of all transactions after the most recent fsimage file and  contains the transactions of the file system changes (like create file, delete file, permissions change etc)

The Checkpointing process will then periodically merge the content of the most recent fsimage with the edits (containing new transactions) to create a new fsimage.

Although the edits log file are redundant after they are merged in fsimage, they are kept for safety/potential recovery requirement reasons and is part of the regular design. This should be finite by default however. The two configuration parameters to control this are:

a. "dfs.namenode.num.extra.edits.retained" (default 1000000) : This determines many transactions to keep, regardless of how many edit files they are spread across.
b. "dfs.namenode.max.extra.edits.segments.retained" (default 10000). This serves as a secondary cap for the former. This means that around 10000 extra files would be kept at all times, as long as those 10000 files keep about 1 million edits (per above configuration parameter).

On a healthy, periodic checkpointing cluster, each edit file should not be higher than ~2-5MB, and thus the overall space footprint of keeping these edits around is never high to cause any concern. Also situation has never occurred where these defaults values needed to be lowered

Unnecessary edits (those beyond the retain number configurations) are only purged upon each successful checkpointing at the active namenode, which purges the local NameNode edits files and asks the Journal Nodes to purge their edits file. So if a checkpoint is not occurring, that can cause edits file to be not purged.

These two properties are not commonly changed and therefore are not exposed as separate properties within Cloudera Manager, hence they will need to be added in the NameNode Safety Valve ("NameNode Configuration Safety Valve for hdfs-site.xml") and requires Namenode restart.

<property>
<name>dfs.namenode.num.extra.edits.retained</name>
<value>1000000</value>
</property>

<property>
<name>dfs.namenode.max.extra.edits.segments.retained</name>
<value>10000</value>
</property>


 

View solution in original post

3 REPLIES 3

avatar
Expert Contributor


edits file
: An edits file contains a log of all transactions after the most recent fsimage file and  contains the transactions of the file system changes (like create file, delete file, permissions change etc)

The Checkpointing process will then periodically merge the content of the most recent fsimage with the edits (containing new transactions) to create a new fsimage.

Although the edits log file are redundant after they are merged in fsimage, they are kept for safety/potential recovery requirement reasons and is part of the regular design. This should be finite by default however. The two configuration parameters to control this are:

a. "dfs.namenode.num.extra.edits.retained" (default 1000000) : This determines many transactions to keep, regardless of how many edit files they are spread across.
b. "dfs.namenode.max.extra.edits.segments.retained" (default 10000). This serves as a secondary cap for the former. This means that around 10000 extra files would be kept at all times, as long as those 10000 files keep about 1 million edits (per above configuration parameter).

On a healthy, periodic checkpointing cluster, each edit file should not be higher than ~2-5MB, and thus the overall space footprint of keeping these edits around is never high to cause any concern. Also situation has never occurred where these defaults values needed to be lowered

Unnecessary edits (those beyond the retain number configurations) are only purged upon each successful checkpointing at the active namenode, which purges the local NameNode edits files and asks the Journal Nodes to purge their edits file. So if a checkpoint is not occurring, that can cause edits file to be not purged.

These two properties are not commonly changed and therefore are not exposed as separate properties within Cloudera Manager, hence they will need to be added in the NameNode Safety Valve ("NameNode Configuration Safety Valve for hdfs-site.xml") and requires Namenode restart.

<property>
<name>dfs.namenode.num.extra.edits.retained</name>
<value>1000000</value>
</property>

<property>
<name>dfs.namenode.max.extra.edits.segments.retained</name>
<value>10000</value>
</property>


 

avatar

Whenever I shutdown the namenode, it runs out of memory trying to load all the edits_ files and shuts down. I've had to format the namenode and copy the data from another source in order to get the cluster to work correctly. Right now I have about 150k files in the namenode/current directory

avatar

Anyone had this problem before ?