Reply
Highlighted
New Contributor
Posts: 5
Registered: ‎11-16-2016

NameNode is busy, Network IO is high when a datanode is down

One datanode was down due to temporary network disconnection, and it was back online about 30~40mins later. We observed that NameNode was busy and unresponsive, and a lot of nodes reported the incoming and outgoing traffic more than 800Mbps during this down time. 

 

We didn't have jobs running at time. I understood HDFS was busy to copying the blocks under the replication number. But this made the whole cluster significantly downgraded. Is it normal?

 

We have replication factor = 3. 16 nodes each has about 8TB data. 

 

We are running CDH 5.3.2. 

Announcements