When you ingest the data from an edge node that is also running datanode role, the 1st copy will always be written to that DN and it will use space much faster than any other datanode. To re-distribute space usage among all datanodes, you must run hdfs balancer.
... View more
Hello, You may want to look into your secondary/standby NameNode (NN). There are 2 files to keep in mind: fsimage and edits. Fsimage contains the snapshot of the filesystem namespace and edits contains all the changes of the current fsimage file. The secondary NN talks to active NN if either of the following conditions are met: pre-configured time has elapsed since last checkpoint or number of edits. When this event triggers, the active NN rolls a new edit file and secondary NN combines (in its memory) fsimage and edits, and send updated fsimage to active NN. If for any reason secondary NN is not able to peform checkpoints (or talk to the primary NN), active NN won't be able to roll the edits file and it would keep on increasing. You can read more about this process in this blog: http://blog.cloudera.com/blog/2014/03/a-guide-to-checkpointing-in-hadoop/ If that is not the cause, then there is a tunable to hown many edits files are retained http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml.
... View more