Member since
08-08-2017
1652
Posts
30
Kudos Received
11
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1941 | 06-15-2020 05:23 AM | |
| 15708 | 01-30-2020 08:04 PM | |
| 2086 | 07-07-2019 09:06 PM | |
| 8146 | 01-27-2018 10:17 PM | |
| 4625 | 12-31-2017 10:12 PM |
01-14-2019
06:26 PM
1 Kudo
any way why not just to the follwing steps: 1. stop all data node component put the host in maintenance mode - I am not sure if this step is needed ? 2. shutdown the datanode machine 3. replace the fault disk 4. start the datanode machine 5. start the datanode components
... View more
01-14-2019
06:24 PM
in ambari - 2.6.1 - I not see option for Decommission the Datanode
... View more
01-14-2019
06:03 PM
we have ambari cluster - version 2.6.1 HDP - version 2.6.4 each datanode have 12 disks with 500G size one of the disk is faulty disk and need to replace it what are the full steps that required , in order to replace the faulty disk
... View more
Labels:
01-14-2019
06:41 AM
@Geoffrey Shelton Okot , do you mean that we need to check the RAM memory on our data node machines ? , we have on each machine 256G memory and available is 198G , or maybe you want to check other thing?
... View more
01-13-2019
06:03 PM
we have hadoop cluster with datanode machines
we notice that CPU load average is high on the DATANODE machines uptime
17:27:46 up 263 days, 3:39, 3 users, load average: 7.94, 6.66, 7.38
after short verification we notice that there are many delete files ( from lsof )
example java 193699 yarn 1082r REG 8,16 293715 0 93588652 /grid/sdb/hadoop/hdfs/data/TY/HK-428352611-43.21.3.46-1502127526112/TY/finalized/patr15/patr35/block_1186014185 (deleted)
java 193699 yarn 1191r REG 8,80 292993 0 88474445 /grid/sdf/hadoop/hdfs/data/TY/HK-428352611-43.21.3.46-1502127526112/TY/finalized/patr15/patr35/block_1186014091 (deleted)
java 193699 yarn 1205r REG 8,16 2303 0 93588671 /grid/sdb/hadoop/hdfs/data/TY/HK-428352611-43.21.3.46-1502127526112/TY/finalized/patr15/patr35/block_1186014185_112276263.meta (deleted)
java 193699 yarn 1265r REG 8,32 23931 0 25962378 /grid/sdc/hadoop/hdfs/data/TY/HK-428352611-43.21.3.46-1502127526112/TY/finalized/patr15/patr36/block_1186014275 (deleted)
java 193699 yarn 1273r REG 8,32 195 0 25962397 /grid/sdc/hadoop/hdfs/data/TY/HK-428352611-43.21.3.46-1502127526112/TY/finalized/patr15/patr36/block_1186014275_112276353.meta (deleted)
java 193699 yarn 1307r REG 8,48 66713 0 61461179 /grid/sdd/hadoop/hdfs/data/TY/HK-428352611-43.21.3.46-1502127526112/TY/finalized/patr15/patr36/block_1186014410 (deleted)
java 193699 yarn 1385r REG 8,48 531 0 61461193 /grid/sdd/hadoop/hdfs/data/TY/HK-428352611-43.21.3.46-1502127526112/TY/finalized/patr15/patr36/block_1186014410_112276488.meta (deleted)
java 193699 yarn 1477r REG 8,80 2299 0 88474446 /grid/sdf/hadoop/hdfs/data/TY/HK-428352611-43.21.3.46-1502127526112/TY/finalized/patr15/patr35/block_1186014091_112276169.meta (deleted)
java 193699 yarn 1754r REG 8,16 91051 0 93696129 /grid/sdb/hadoop/hdfs/data/TY/HK-428352611-43.21.3.46-1502127526112/TY/finalized/patr15/patr37/block_1186014689 (deleted)
java 193699 yarn 1760r REG 8,16 719 0 93696130 /grid/sdb/hadoop/hdfs/data/TY/HK-428352611-43.21.3.46-1502127526112/TY/finalized/patr15/patr37/block_1186014689_112276769.meta (deleted)
java 193699 yarn 1972r REG 8,48 37960 0 61447490 /grid/sdd/hadoop/hdfs/data/TY/HK-428352611-43.21.3.46-1502127526112/TY/finalized/patr15/patr39/block_1186015148 (deleted)
java 193699 yarn 1976r REG 8,48 307 0 61447491 /grid/sdd/hadoop/hdfs/data/TY/HK-428352611-43.21.3.46-1502127526112/TY/finalized/patr15/patr39/block_1186015148_112277228.meta (deleted)
to print only the PID of the deleted file : lsof +L1 | awk '{print $2}' | sort | uniq
12588
138025
since all files above are not exists
as /grid/sdd/hadoop/hdfs/data/current/BP-428352611-43.21.3.46-1502127526112/current/finalized/subdir15/subdir39/blk_1186015148_112277228.meta we killed all the PID
as kill 12588 and so on and after we killed all PID the CPU load average decrease as the following uptime
17:27:46 up 263 days, 3:39, 3 users, load average: 2.24, 4.61, 5.75 what cause the pId to still be up in spite files was deleted is it ok to kill the PID with kill PID
... View more
Labels:
01-07-2019
05:11 PM
@Jay any comment regarding my last notes ?
... View more
01-07-2019
12:17 PM
@Jay , I will explain what was the issue we try to stop the kafka brokers also from the CLI as /usr/hdp/current/kafka-broker/bin/kafka stop but script return Stopping Kafka [82516] failed. so we not have a choice and we kill the process by kill -9 then start the kafka broker as you know the script not use the "-9" , ( from the script its only kill <PID> ) so we need to check why need aggressive kill ( as kill -9 ) ,
... View more
01-07-2019
06:59 AM
Jay , what we can see from the kafka.err is that /usr/hdp/current/kafka-broker/bin/kafka: line 180: kill: (55636) - Operation not permitted
... View more
01-06-2019
09:48 PM
@Geoffrey Shelton Okot please suggest what you also need ?
... View more
01-06-2019
09:47 PM
we change only the log_retention_hours to 24 H , but this isnt problem of value or parameter that isnt right
... View more