Member since
03-14-2016
53
Posts
23
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
869 | 09-21-2018 10:02 AM | |
2115 | 09-11-2018 10:44 AM | |
2061 | 07-06-2016 01:14 PM |
10-04-2018
11:25 AM
1 Kudo
We have seen performance issue and stability issue in heavy workload system, this is due to disk latency/Shared Disk. I.e frequent namenode failover, longer boot time, slower checkpoint, slower logging, Higher fsync will cause session expiry, etc. We never recommend a shared disk for Namenode, Journal node and Zookeeper. All of these services should have a dedicated disk. You can configure following disk type according to your HDFS workload.
[dfs.namenode.name.dir]
Namenode fsimage directory, dedicated disk => HDD 15K RPM
[dfs.journalnode.edits.dir]
Namenode edit log directory, dedicated disk => SSD
[dataDir from zoo.cfg]
Zoookeeper snapshot and transaction logs, for normal usage from ZKFC and HBase => HDD 15K RPM
Zoookeeper snapshot and transaction logs, If it used by Nifi/ Accumulo/ Kafka/ Storm/ HBase and ZKFC => SSD
If you are using RAID for meta-directory(dfs.namenode.name.dir & dfs.journalnode.edits.dir), then disable RAID and check the Non-RAID performance. There is a strong redundant for meta-directory (fsimage & edits are available from Standby NameNode and remaining QJNs). If RAID is not disabled for JN, then consider using different RAID. RAID 1 and RAID 10 are also good for the dfs.journalnode.edits.dir set rather than RAID 5, due to increased in write latency for small block writes. if you don't have a faster disk, then don't consider using fsimage replication. It will impact write performance even if one of the disks slower.
... View more
Labels:
10-04-2018
09:53 AM
@Elias Abacioglu You can refer below guidance for configuring service port. https://community.hortonworks.com/articles/223817/how-do-you-enable-namenode-service-rpc-port-withou.html
... View more
10-04-2018
09:42 AM
1 Kudo
The service RPC port gives the DataNodes a dedicated port to report their status via block reports and heartbeats. The port is also used by Zookeeper Failover Controllers for periodic health checks by the automatic failover logic. The port is never used by client applications hence it reduces RPC queue contention between client requests and DataNode messages. Steps: 1) Ambari -> HDFS -> Configs -> Advanced -> Custom hdfs-site -> Add Property dfs.namenode.servicerpc-address.<dfs.internal.nameservices>.nn1=<namenode1 host:rpc port> dfs.namenode.servicerpc-address.<dfs.internal.nameservices>.nn2=<namenode2 host:rpc port> dfs.namenode.service.handler.count=(dfs.namenode.handler.count / 2) This RPC port receives all DN and ZKFC requests like block report, heartbeat, liveness report, etc.. Example from hdfs-site.xml,
dfs.nameservices=shva
dfs.internal.nameservices=shva
dfs.ha.namenodes.shva=nn1,nn2
dfs.namenode.rpc-address.shva.nn1=hwxunsecure2641.openstacklocal:8020
dfs.namenode.rpc-address.shva.nn2=hwxunsecure2642.openstacklocal:8020
dfs.namenode.handler.count=200 Service RPC host, port and handler threads:
dfs.namenode.servicerpc-address.shva.nn1=hwxunsecure2641.openstacklocal:8040
dfs.namenode.servicerpc-address.shva.nn2=hwxunsecure2642.openstacklocal:8040
dfs.namenode.service.handler.count=100
2) Restart Standby Namenode. You must wait till Standby Namenode out of safemode. Note: You can check Safemode status in Standby Namenode UI. 3) Restart Active Namenode. 4) Stop Standby Namenode ZKFC controller. 5) Stop Active Namenode ZKFC controller. 6) Login Active Namenode and reset Namenode HA state. #su - hdfs
$hdfs zkfc -formatZK
7) Login Standby Namenode and reset Namenode HA state. #su - hdfs
$hdfs zkfc -formatZK 😎 Start Active Namenode ZKFC controller. 9) Start Standby Namenode ZKFC controller. 10) Rolling restart the Datanodes. Note: Please check, Nodemanager should not be installed in Namenode box because it uses same port 8040. If installed then you need to change service RPC port from 8040 to different port. Ref: scaling-the-hdfs-namenode
... View more
Labels:
10-04-2018
07:41 AM
@Muthukumar Somasundaram Formatting is not an ideal option to solve this issue. In this case, you lost all your data.
... View more
09-21-2018
10:02 AM
@Santanu Ghosh The branch 1.x do not have namenode HA with QJN based. It is production ready and available only from hadoop-2.x. You can refer HDFS-HA umbrella jira, HDFS-3278.
... View more
09-18-2018
09:26 AM
Oops! It looks these are the spam account. Thx @Jordan Moore for your info.
... View more
09-17-2018
12:56 PM
2 Kudos
@Harshali Patel HDFS data is distributed across datanodes in local file system storage. You can configure list of storage disk dfs.datanode.data.dir in hdfs-site.xml. dfs.datanode.data.dir - Determines where on the local filesystem an HDFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored.
... View more
09-11-2018
10:44 AM
2 Kudos
You can safely ignore this warning if you don't have enabled service RPC. This is dedicated port in Namenode, datanode will send liveness and block report to this queue. The port is never used by client applications hence it reduces RPC queue contention between client requests and DataNode messages. You can ref this link for more, Scaling Namenode.
... View more
09-11-2018
09:22 AM
@Jagatheesh Ramakrishnan Appreciate your effort and writing this data recovery part. Can you please add a note on this article? Namenode should be stopped very immediate after file deletion otherwise it's hard to recover because namenode already send out block deletion request to datanode. So physical block might get deleted by datanode.
... View more
09-06-2018
07:28 AM
You can do tail in namenode and datanode log, also you can redirect output to dummy log file during restart. #tailf <namenode log> >/tmp/namenode-`hostname`.log #tailf <datanode log> >/tmp/datanode-`hostname`.log
... View more