Member since
08-08-2017
1652
Posts
30
Kudos Received
11
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1944 | 06-15-2020 05:23 AM | |
| 15788 | 01-30-2020 08:04 PM | |
| 2092 | 07-07-2019 09:06 PM | |
| 8164 | 01-27-2018 10:17 PM | |
| 4639 | 12-31-2017 10:12 PM |
09-05-2018
10:13 PM
Executor is a distributed agent that is responsible for executing tasks. this is very clear but how to know if there are any issues with the executors that runs from datanode machine? I asking this question because when I am looking on the datanode machine I not see any logs that represented the executors , and I not understand how to trace problems about the exectores the second important quastion: heartbeat are sent from the executor to the driver what are the logs that represented this heartbeat ? how to know if there are any issue with heartbeat sending ?
... View more
Labels:
09-05-2018
09:15 PM
we have HDP cluster version 2.6.4 , ambari version 2.6.1 with 8 workers machines ( datanode machines ) on each worker machines we have the folder /var/log/spark2 but no any logs under this folder on the master machines - when the spark thrift running we have the /var/log/spark2 and logs are created corectly on this machine but not on the datanode machine spark thrift restasrt twice . but this not help to create the logs on the datanode machine any other ideas what we can do ?
... View more
Labels:
09-05-2018
09:10 PM
let me know if you have some conclusions , as you saw the configuration in HDFS and in the XML is correct , so I show you the real status and disk are corectly configured in HDFS ,,
... View more
09-05-2018
09:03 PM
this is the file: , and its look fine <name>dfs.datanode.data.dir</name>
<value>/data/sdb/hadoop/hdfs/data,/data/sdc/hadoop/hdfs/data,/data/sdd/hadoop/hdfs/data,/data/sde/hadoop/hdfs/data</value>
--
<name>dfs.datanode.data.dir.perm</name>
<value>750</value>
... View more
09-05-2018
09:01 PM
hi per your request , this is the file <name>dfs.datanode.data.dir</name>
<value>/data/sdb/hadoop/hdfs/data,/data/sdc/hadoop/hdfs/data,/data/sdd/hadoop/hdfs/data,/data/sde/hadoop/hdfs/data</value>
--
<name>dfs.datanode.data.dir.perm</name>
<value>750</value>
... View more
09-05-2018
06:38 PM
so what is the final conclution , why we have a gap between what the disks size and the HDFS as displayed on the amabri dasborad ?
... View more
09-05-2018
06:34 PM
as all know the heartbeat is a signal sent periodically in order to indicate normal operation of the node or synchronize with other parts of the system in our system we have 5 workers machine , while executes run on 3 of them our system include 5 datanodes machines ( workers ) , and 3 master machines , hadoop version is 2.6.4 and thrift server install on the first master1 machine ( and driver is in master1 ) In Spark the heartbeats are the messages sent by executors ( from workers machines ) to the driver.( master1 machine ) the message is represented by case class org.apache.spark.Heartbeat The message is then received by the driver through org.apache.spark.HeartbeatReceiver#receiveAndReply(context: RpcCallContext) method. The driver: the main purpose of heartbeats consists on checking if given node is still alive ( from worker machine to master1 machine ) The driver verifies it at fixed interval (defined in spark.network.timeoutInterval entry) by sending ExpireDeadHosts message to itself. When the message is handled, the driver checks for the executors with no recent heartbeats. until now I explain the concept We notice that the messages sent by the executor can not be delivered to the driver , and from the yarn logs we can see that warning WARN executor.Executor: Issue communicating with driver in heartbeater My question is - what could be the reasons that driver ( master1 machine ) not get the heartbeat from the workers machines
... View more
Labels:
09-05-2018
06:04 PM
still not see any change for debug in the log under /var/log/spark2 ,
... View more
09-05-2018
03:58 PM
no this isnt help , under master machines /var/log/spark2 and datanode machine under /var/log/spark2 we not see any changes of the log
... View more
09-05-2018
03:55 PM
yes we restart the HDFS , it is auto installation , and all lab with that ,
... View more