Support Questions

Find answers, ask questions, and share your expertise

NameNode alerting on Blocks under replicated event dfs replication changed at NN and DNs.

avatar
Master Collaborator

Hi,

 

When i Run fsck on my cluster i got that several blocks under replicated and the target replication is 3 even i changed the dfs.replication to NN and DNs to 2.

 

My cluster status

 

Live Nodes:3 (Decommissioned: 1)

 

 

Total size: 1873902607439 B
Total dirs: 122633
Total files: 117412
Total blocks (validated): 119731 (avg. block size 15650939 B)
Minimally replicated blocks: 119731 (100.0 %)
Over-replicated blocks: 68713 (57.38948 %)
Under-replicated blocks: 27 (0.022550551 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 2.5738947
Corrupt blocks: 0
Missing replicas: 27 (0.011274004 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Mon Jan 30 04:59:23 EST 2017 in 2468 milliseconds

 

 

NN and DNs hdfs.site.xml:

 

<property>
<name>dfs.replication</name>
<value>2</value>
</property>

 

The only change i did that i deco one of the servers and it's now in decomissioned state, even i set replication factor for all HDFS manually to 2 but still see the new written blocks are alerted on target replica as 3, also i ensure that the mapred submit replica also 2 in JT:

 

<property>
<name>mapred.submit.replication</name>
<value>2</value>
</property>

 

Any insights?

 

 

1 ACCEPTED SOLUTION

avatar
Champion
Lets step back, instead of trying to hunt down were it is set on the client side mark dfs.replication to final in your configs. This will prevent any clients from changing it at run time.

<property>
<name>dfs.replication</name>
<value>2</value>
<final>true</final>
</property>

View solution in original post

24 REPLIES 24

avatar
Master Collaborator

If the jobs submitted using oozie and all DNs and NN has replication factor, i checked hdfs-site.xml and mapred-site.xml at all the cluster nodes and all has the value 2, which service i should restart after the change?

 

avatar
Champion
Were you checking /etc/hadoop/conf?

Restart oozie and then find the running process directory under /var/run/cloudera-scm-agent/process. Check the hdfs-site.xml under it to ensure it is set to 2 as well.

avatar
Master Collaborator

yes, I'm looking at /etc/hadoop/conf.

 

I already tired and restarted the oozie with no success.

 

I'm using hadoop version 2.0.0-cdh4.3.0, tried to check under /var/run/mapred dirs but find only pid file.

 

Under /var/run this is what i see:

 

hald
pm-utils
saslauthd
plymouth
setrans
hadoop-yarn
hadoop-mapreduce
nslcd
console
sepermit
faillock
mdadm
lvm
netreport
ConsoleKit
zookeeper
vmtoolsd.pid
vmware
syslogd.pid
portreserve
auditd.pid
sssd.pid
irqbalance.pid
messagebus.pid
dbus
haldaemon.pid
cupsd.pid
cups
acpid.socket
acpid.pid
xinetd.pid
sshd.pid
nscd
logstash-forwarder.pid
autofs.pid
autofs.fifo-net
autofs.fifo-misc
autofs-running
ntpd.pid
mtstrmd.pid
sm-client.pid
sendmail.pid
abrtd.pid
abrt
hadoop-0.20-mapreduce
crond.pid
cron.reboot
atd.pid
puppet
hsflowd.pid
mcollectived.pid
hadoop-hdfs
zabbix
oozie
utmp

avatar
Champion
Did you deploy with Cloudera Manager?

avatar
Master Collaborator

No, i'm not using CM

avatar
Champion
Lets step back, instead of trying to hunt down were it is set on the client side mark dfs.replication to final in your configs. This will prevent any clients from changing it at run time.

<property>
<name>dfs.replication</name>
<value>2</value>
<final>true</final>
</property>

avatar
Master Collaborator

Changed at all the cluster nodes and restarted all services at the cluster after.

 

It didn't solve the issue.

avatar
Master Collaborator

Looking at one of  the running jobs conf and see the following with replication factor 3:

 

mapreduce.client.submit.file.replication

s3.replication

kfs.replication

dfs.namenode.replication.interval

ftp.replication

s3native.replication

 

 

 

 

avatar
Master Collaborator

Any other ideas?

 

The more intersting in the issue that it's happens only for the output of specific jobs and notf or all the HDFS.

 

Is there any way to set that the new written files to specific dir to be with specific replication factor?

avatar
Master Collaborator

Digging down in the cluster, i found one of the application that runs outside of the hadoop cluster has clients that make hdfs dfs -put to the hadoop cluster, these clients weren't have hdfs-site.xml and it got the default replication factor for the cluster, what i did? tested the hdfs dfs -put from a cleint server in my cluster and the client out side the cluster and notice the client outside the cluster put files with replication factor 3, to solve the issue i added hdfs-site.xml to each of the clients outside the cluster and override the default replication factor at the file.