About mike_bronson7

mike_bronson7 · ‎09-05-2018

Executor is a distributed agent that is responsible for executing tasks. this is very clear but how to know if there are any issues with the executors that runs from datanode machine? I asking this question because when I am looking on the datanode machine I not see any logs that represented the executors , and I not understand how to trace problems about the exectores the second important quastion: heartbeat are sent from the executor to the driver what are the logs that represented this heartbeat ? how to know if there are any issue with heartbeat sending ?

mike_bronson7 · ‎09-05-2018

we have HDP cluster version 2.6.4 , ambari version 2.6.1 with 8 workers machines ( datanode machines ) on each worker machines we have the folder /var/log/spark2 but no any logs under this folder on the master machines - when the spark thrift running we have the /var/log/spark2 and logs are created corectly on this machine but not on the datanode machine spark thrift restasrt twice . but this not help to create the logs on the datanode machine any other ideas what we can do ?

mike_bronson7 · ‎09-05-2018

let me know if you have some conclusions , as you saw the configuration in HDFS and in the XML is correct , so I show you the real status and disk are corectly configured in HDFS ,,

mike_bronson7 · ‎09-05-2018

this is the file: , and its look fine <name>dfs.datanode.data.dir</name> <value>/data/sdb/hadoop/hdfs/data,/data/sdc/hadoop/hdfs/data,/data/sdd/hadoop/hdfs/data,/data/sde/hadoop/hdfs/data</value> -- <name>dfs.datanode.data.dir.perm</name> <value>750</value>

mike_bronson7 · ‎09-05-2018

hi per your request , this is the file <name>dfs.datanode.data.dir</name> <value>/data/sdb/hadoop/hdfs/data,/data/sdc/hadoop/hdfs/data,/data/sdd/hadoop/hdfs/data,/data/sde/hadoop/hdfs/data</value> -- <name>dfs.datanode.data.dir.perm</name> <value>750</value>

mike_bronson7 · ‎09-05-2018

so what is the final conclution , why we have a gap between what the disks size and the HDFS as displayed on the amabri dasborad ?

mike_bronson7 · ‎09-05-2018

as all know the heartbeat is a signal sent periodically in order to indicate normal operation of the node or synchronize with other parts of the system in our system we have 5 workers machine , while executes run on 3 of them our system include 5 datanodes machines ( workers ) , and 3 master machines , hadoop version is 2.6.4 and thrift server install on the first master1 machine ( and driver is in master1 ) In Spark the heartbeats are the messages sent by executors ( from workers machines ) to the driver.( master1 machine ) the message is represented by case class org.apache.spark.Heartbeat The message is then received by the driver through org.apache.spark.HeartbeatReceiver#receiveAndReply(context: RpcCallContext) method. The driver: the main purpose of heartbeats consists on checking if given node is still alive ( from worker machine to master1 machine ) The driver verifies it at fixed interval (defined in spark.network.timeoutInterval entry) by sending ExpireDeadHosts message to itself. When the message is handled, the driver checks for the executors with no recent heartbeats. until now I explain the concept We notice that the messages sent by the executor can not be delivered to the driver , and from the yarn logs we can see that warning WARN executor.Executor: Issue communicating with driver in heartbeater My question is - what could be the reasons that driver ( master1 machine ) not get the heartbeat from the workers machines

mike_bronson7 · ‎09-05-2018

still not see any change for debug in the log under /var/log/spark2 ,

mike_bronson7 · ‎09-05-2018

no this isnt help , under master machines /var/log/spark2 and datanode machine under /var/log/spark2 we not see any changes of the log

mike_bronson7 · ‎09-05-2018

yes we restart the HDFS , it is auto installation , and all lab with that ,

Online	Offline
Last Visited	‎08-27-2024 09:17 AM

Member Since	‎08-08-2017 09:40 AM
Last Visited	‎08-27-2024 09:17 AM
Posts	1,652
Kudos received	29

Cloudera Community

Re: how to find number of CPU core on datanode ma...

Re: postgresql + ambari server failed to open port...

Re: how to stop the thrift servers by REST API

Re: namenode is in safe mode

Re: Directory /grid/sdg/hadoop/hdfs/data became un...

What are Spark executors that runs from the datano...

why spark2 logs are not created in the datanode m...

Re: HDFS is almost full 90% but data node disks ar...

Re: HDFS is almost full 90% but data node disks ar...

Re: HDFS is almost full 90% but data node disks ar...

Re: HDFS is almost full 90% but data node disks ar...

Spark failure detection - why datanode not send he...

Re: change the Advanced spark2-log4j-propertiese ...

Re: change the Advanced spark2-log4j-propertiese ...

Re: HDFS is almost full 90% but data node disks ar...