- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
NameNode alerting on Blocks under replicated event dfs replication changed at NN and DNs.
- Labels:
-
HDFS
Created 01-30-2017 02:27 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
When i Run fsck on my cluster i got that several blocks under replicated and the target replication is 3 even i changed the dfs.replication to NN and DNs to 2.
My cluster status
Live Nodes | : | 3 (Decommissioned: 1) |
Total size: 1873902607439 B
Total dirs: 122633
Total files: 117412
Total blocks (validated): 119731 (avg. block size 15650939 B)
Minimally replicated blocks: 119731 (100.0 %)
Over-replicated blocks: 68713 (57.38948 %)
Under-replicated blocks: 27 (0.022550551 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 2.5738947
Corrupt blocks: 0
Missing replicas: 27 (0.011274004 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Mon Jan 30 04:59:23 EST 2017 in 2468 milliseconds
NN and DNs hdfs.site.xml:
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
The only change i did that i deco one of the servers and it's now in decomissioned state, even i set replication factor for all HDFS manually to 2 but still see the new written blocks are alerted on target replica as 3, also i ensure that the mapred submit replica also 2 in JT:
<property>
<name>mapred.submit.replication</name>
<value>2</value>
</property>
Any insights?
Created 02-13-2017 08:55 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
<property>
<name>dfs.replication</name>
<value>2</value>
<final>true</final>
</property>
Created 02-13-2017 09:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If the jobs submitted using oozie and all DNs and NN has replication factor, i checked hdfs-site.xml and mapred-site.xml at all the cluster nodes and all has the value 2, which service i should restart after the change?
Created 02-13-2017 10:01 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Restart oozie and then find the running process directory under /var/run/cloudera-scm-agent/process. Check the hdfs-site.xml under it to ensure it is set to 2 as well.
Created 02-13-2017 10:24 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yes, I'm looking at /etc/hadoop/conf.
I already tired and restarted the oozie with no success.
I'm using hadoop version 2.0.0-cdh4.3.0, tried to check under /var/run/mapred dirs but find only pid file.
Under /var/run this is what i see:
hald
pm-utils
saslauthd
plymouth
setrans
hadoop-yarn
hadoop-mapreduce
nslcd
console
sepermit
faillock
mdadm
lvm
netreport
ConsoleKit
zookeeper
vmtoolsd.pid
vmware
syslogd.pid
portreserve
auditd.pid
sssd.pid
irqbalance.pid
messagebus.pid
dbus
haldaemon.pid
cupsd.pid
cups
acpid.socket
acpid.pid
xinetd.pid
sshd.pid
nscd
logstash-forwarder.pid
autofs.pid
autofs.fifo-net
autofs.fifo-misc
autofs-running
ntpd.pid
mtstrmd.pid
sm-client.pid
sendmail.pid
abrtd.pid
abrt
hadoop-0.20-mapreduce
crond.pid
cron.reboot
atd.pid
puppet
hsflowd.pid
mcollectived.pid
hadoop-hdfs
zabbix
oozie
utmp
Created 02-13-2017 10:27 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created 02-13-2017 11:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No, i'm not using CM
Created 02-13-2017 08:55 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
<property>
<name>dfs.replication</name>
<value>2</value>
<final>true</final>
</property>
Created 02-14-2017 01:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Changed at all the cluster nodes and restarted all services at the cluster after.
It didn't solve the issue.
Created 02-14-2017 02:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Looking at one of the running jobs conf and see the following with replication factor 3:
mapreduce.client.submit.file.replication
s3.replication
kfs.replication
dfs.namenode.replication.interval
ftp.replication
s3native.replication
Created on 02-17-2017 11:40 PM - edited 02-18-2017 09:23 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Any other ideas?
The more intersting in the issue that it's happens only for the output of specific jobs and notf or all the HDFS.
Is there any way to set that the new written files to specific dir to be with specific replication factor?
Created 02-23-2017 12:49 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Digging down in the cluster, i found one of the application that runs outside of the hadoop cluster has clients that make hdfs dfs -put to the hadoop cluster, these clients weren't have hdfs-site.xml and it got the default replication factor for the cluster, what i did? tested the hdfs dfs -put from a cleint server in my cluster and the client out side the cluster and notice the client outside the cluster put files with replication factor 3, to solve the issue i added hdfs-site.xml to each of the clients outside the cluster and override the default replication factor at the file.
