Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)
Contributor

We have seen performance issue and stability issue in heavy workload system, this is due to disk latency/Shared Disk. I.e frequent namenode failover, longer boot time, slower checkpoint, slower logging, Higher fsync will cause session expiry, etc.

We never recommend a shared disk for Namenode, Journal node and Zookeeper. All of these services should have a dedicated disk.

You can configure following disk type according to your HDFS workload.
[dfs.namenode.name.dir]
 Namenode fsimage directory, dedicated disk => HDD 15K RPM 
[dfs.journalnode.edits.dir] 
 Namenode edit log directory, dedicated disk => SSD 
[dataDir from zoo.cfg]
 Zoookeeper snapshot and transaction logs, for normal usage from ZKFC and HBase => HDD 15K RPM 
 Zoookeeper snapshot and transaction logs, If it used by Nifi/ Accumulo/ Kafka/ Storm/ HBase and ZKFC => SSD 

If you are using RAID for meta-directory(dfs.namenode.name.dir & dfs.journalnode.edits.dir), then disable RAID and check the Non-RAID performance. There is a strong redundant for meta-directory (fsimage & edits are available from Standby NameNode and remaining QJNs).

If RAID is not disabled for JN, then consider using different RAID. RAID 1 and RAID 10 are also good for the dfs.journalnode.edits.dir set rather than RAID 5, due to increased in write latency for small block writes.

if you don't have a faster disk, then don't consider using fsimage replication. It will impact write performance even if one of the disks slower.

379 Views
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎10-04-2018 11:25 AM
Updated by:
 
Contributors
Top Kudoed Authors