Member since
08-08-2017
1652
Posts
30
Kudos Received
11
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2092 | 06-15-2020 05:23 AM | |
| 17432 | 01-30-2020 08:04 PM | |
| 2254 | 07-07-2019 09:06 PM | |
| 8711 | 01-27-2018 10:17 PM | |
| 4912 | 12-31-2017 10:12 PM |
09-05-2018
06:34 PM
as all know the heartbeat is a signal sent periodically in order to indicate normal operation of the node or synchronize with other parts of the system in our system we have 5 workers machine , while executes run on 3 of them our system include 5 datanodes machines ( workers ) , and 3 master machines , hadoop version is 2.6.4 and thrift server install on the first master1 machine ( and driver is in master1 ) In Spark the heartbeats are the messages sent by executors ( from workers machines ) to the driver.( master1 machine ) the message is represented by case class org.apache.spark.Heartbeat The message is then received by the driver through org.apache.spark.HeartbeatReceiver#receiveAndReply(context: RpcCallContext) method. The driver: the main purpose of heartbeats consists on checking if given node is still alive ( from worker machine to master1 machine ) The driver verifies it at fixed interval (defined in spark.network.timeoutInterval entry) by sending ExpireDeadHosts message to itself. When the message is handled, the driver checks for the executors with no recent heartbeats. until now I explain the concept We notice that the messages sent by the executor can not be delivered to the driver , and from the yarn logs we can see that warning WARN executor.Executor: Issue communicating with driver in heartbeater My question is - what could be the reasons that driver ( master1 machine ) not get the heartbeat from the workers machines
... View more
Labels:
09-05-2018
06:04 PM
still not see any change for debug in the log under /var/log/spark2 ,
... View more
09-05-2018
03:58 PM
no this isnt help , under master machines /var/log/spark2 and datanode machine under /var/log/spark2 we not see any changes of the log
... View more
09-05-2018
03:55 PM
yes we restart the HDFS , it is auto installation , and all lab with that ,
... View more
09-05-2018
03:02 PM
we configured 4 disk! , it is not the first time that we configured , and this is the same on all our lab cluster , please look on this , you can see clearly 4 disks !
... View more
09-05-2018
02:38 PM
the last one is : http://<active namenode host>:50070/dfshealth.html#tab-datanode-volume-failures
... View more
09-05-2018
02:35 PM
this is what we get from http://xxx.xxx.xxx.xxx:50070/dfshealth.html#tab-datanode
... View more
09-05-2018
02:31 PM
this is what we get from "http://<active namenode host>:50070/dfshealth.html#tab-overview"
... View more
09-05-2018
02:19 PM
Logging Levels The valid logging levels are log4j’s Levels (from most specific to least):
OFF (most specific, no logging) FATAL (most specific, little data) ERROR WARN INFO DEBUG TRACE (least specific, a lot of data) ALL (least specific, all data)
... View more
09-05-2018
02:18 PM
for example I change all to ALL ( latest ) , to get the most details in spark logs and then restart the spark , but I not see that logs gives more data # Set everything to be logged to the console
log4j.rootCategory=ALL, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=ALL
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ALL
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=ALL
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=ALL
log4j.logger.org.apache.spark.metrics.MetricsConfig=ALL
log4j.logger.org.apache.spark.deploy.yarn.Client=ALL
... View more