Support Questions

Fawze · ‎01-30-2017

Hi,

When i Run fsck on my cluster i got that several blocks under replicated and the target replication is 3 even i changed the dfs.replication to NN and DNs to 2.

My cluster status

Live Nodes

:

3 (Decommissioned: 1)

Total size: 1873902607439 B
Total dirs: 122633
Total files: 117412
Total blocks (validated): 119731 (avg. block size 15650939 B)
Minimally replicated blocks: 119731 (100.0 %)
Over-replicated blocks: 68713 (57.38948 %)
Under-replicated blocks: 27 (0.022550551 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 2.5738947
Corrupt blocks: 0
Missing replicas: 27 (0.011274004 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Mon Jan 30 04:59:23 EST 2017 in 2468 milliseconds

NN and DNs hdfs.site.xml:

<property>
<name>dfs.replication</name>
<value>2</value>
</property>

The only change i did that i deco one of the servers and it's now in decomissioned state, even i set replication factor for all HDFS manually to 2 but still see the new written blocks are alerted on target replica as 3, also i ensure that the mapred submit replica also 2 in JT:

<property>
<name>mapred.submit.replication</name>
<value>2</value>
</property>

Any insights?

mbigelow · ‎02-13-2017

Lets step back, instead of trying to hunt down were it is set on the client side mark dfs.replication to final in your configs. This will prevent any clients from changing it at run time.

<property>
<name>dfs.replication</name>
<value>2</value>
<final>true</final>
</property>

View solution in original post

Fawze · ‎02-13-2017

If the jobs submitted using oozie and all DNs and NN has replication factor, i checked hdfs-site.xml and mapred-site.xml at all the cluster nodes and all has the value 2, which service i should restart after the change?

mbigelow · ‎02-13-2017

Were you checking /etc/hadoop/conf?

Restart oozie and then find the running process directory under /var/run/cloudera-scm-agent/process. Check the hdfs-site.xml under it to ensure it is set to 2 as well.

Fawze · ‎02-13-2017

yes, I'm looking at /etc/hadoop/conf.

I already tired and restarted the oozie with no success.

I'm using hadoop version 2.0.0-cdh4.3.0, tried to check under /var/run/mapred dirs but find only pid file.

Under /var/run this is what i see:

hald
pm-utils
saslauthd
plymouth
setrans
hadoop-yarn
hadoop-mapreduce
nslcd
console
sepermit
faillock
mdadm
lvm
netreport
ConsoleKit
zookeeper
vmtoolsd.pid
vmware
syslogd.pid
portreserve
auditd.pid
sssd.pid
irqbalance.pid
messagebus.pid
dbus
haldaemon.pid
cupsd.pid
cups
acpid.socket
acpid.pid
xinetd.pid
sshd.pid
nscd
logstash-forwarder.pid
autofs.pid
autofs.fifo-net
autofs.fifo-misc
autofs-running
ntpd.pid
mtstrmd.pid
sm-client.pid
sendmail.pid
abrtd.pid
abrt
hadoop-0.20-mapreduce
crond.pid
cron.reboot
atd.pid
puppet
hsflowd.pid
mcollectived.pid
hadoop-hdfs
zabbix
oozie
utmp

mbigelow · ‎02-13-2017

Did you deploy with Cloudera Manager?

Fawze · ‎02-13-2017

No, i'm not using CM

mbigelow · ‎02-13-2017

Lets step back, instead of trying to hunt down were it is set on the client side mark dfs.replication to final in your configs. This will prevent any clients from changing it at run time.

<property>
<name>dfs.replication</name>
<value>2</value>
<final>true</final>
</property>

Fawze · ‎02-14-2017

Changed at all the cluster nodes and restarted all services at the cluster after.

It didn't solve the issue.

Fawze · ‎02-14-2017

Looking at one of the running jobs conf and see the following with replication factor 3:

mapreduce.client.submit.file.replication

s3.replication

kfs.replication

dfs.namenode.replication.interval

ftp.replication

s3native.replication

Fawze · ‎02-17-2017

Any other ideas?

The more intersting in the issue that it's happens only for the output of specific jobs and notf or all the HDFS.

Is there any way to set that the new written files to specific dir to be with specific replication factor?

Fawze · ‎02-23-2017

Digging down in the cluster, i found one of the application that runs outside of the hadoop cluster has clients that make hdfs dfs -put to the hadoop cluster, these clients weren't have hdfs-site.xml and it got the default replication factor for the cluster, what i did? tested the hdfs dfs -put from a cleint server in my cluster and the client out side the cluster and notice the client outside the cluster put files with replication factor 3, to solve the issue i added hdfs-site.xml to each of the clients outside the cluster and override the default replication factor at the file.

Cloudera Community

Support Questions

NameNode alerting on Blocks under replicated event dfs replication changed at NN and DNs.