Member since
08-08-2017
1652
Posts
30
Kudos Received
11
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1922 | 06-15-2020 05:23 AM | |
| 15491 | 01-30-2020 08:04 PM | |
| 2074 | 07-07-2019 09:06 PM | |
| 8122 | 01-27-2018 10:17 PM | |
| 4575 | 12-31-2017 10:12 PM |
09-06-2018
01:09 PM
@Michael Bronson By default Spark2 has log level as WARN. Set it to INFO to get more context on what is going on in the driver and executor. More over the log will be locally available in Nodemanager when the container is still running. The easiest way is to go to spark UI (yarn application master UI) -> click on executors tab -> Here you should see stderr and stdout corresponding to driver and executors. Regarding the WARN on heartbeat , we'd need to check what driver is doing at that point. I think you already have asked another question with more details on driver and executor.
... View more
09-12-2018
04:16 PM
so in case we verify the logs of gc by http://gceasy.io/ , and we see that Driver isn't doing full garbage collection , that what are the next steps that we need to do ?
... View more
09-06-2018
10:01 AM
we need the debug for Spark Thrift server , we have issue when heartbeat from datanode machine not communicated with the driver , so this is the reason that we need debug mode on for Spark Thrift server
... View more
10-25-2018
01:29 PM
@Michael Bronson
The warning message means that the Executor is unable to send the Heartbeat to the driver (might be network issue). This is just a warning message, but each failure increments heartbeat failure count and when we hit the maximum failures the executor will fail and exit with error. There are two configurations that we can tune to avoid this issue. spark.executor.heartbeat.maxFailures (default value: 60) Number of times an executor will try to send heartbeats to the driver before it gives up and exits (with exit code 56). spark.executor.heartbeatInterval ( default value: 10s ) Interval between each executor's heartbeats to the driver. Heartbeats let the driver know that the executor is still alive and update it with metrics for in-progress tasks. spark.executor.heartbeatInterval should be significantly less than spark.network.timeout
... View more
09-05-2018
08:06 AM
@Jay . please let me know if I understand it as the following let say that one of the replica spark2-hdp-yarn-archive.tar.gz , is corrupted when I run this CLI su - hdfs -c "hdfs fsck /hdp/apps/2.6.4.0-91/spark2/spark2-hdp-yarn-archive.tar.gz" dose its actually means that fsck will replace the bad one with the good one and status finally will be HEALTHY ?
... View more
09-06-2018
07:28 AM
You can do tail in namenode and datanode log, also you can redirect output to dummy log file during restart. #tailf <namenode log> >/tmp/namenode-`hostname`.log #tailf <datanode log> >/tmp/datanode-`hostname`.log
... View more
09-03-2018
04:06 PM
@Jonathan Sneep thank you so much
... View more
06-12-2019
08:20 PM
Is there any way to restart an ABORTED or FAILED request?
... View more
09-01-2018
01:27 AM
Nagios / OpsView / Sensu are popular options I've seen StatsD / CollectD / MetricBeat are daemon metric collectors (MetricBeat is somewhat tied to an Elasticsearch cluster though) that run on each server Prometheus is a popular option nowadays that would scrape metrics exposed by local service I have played around a bit with netdata, though I'm not sure if it can be applied for Hadoop monitoring use cases. DataDog is a vendor that offers lots of integrations such as Hadoop, YARN, Kafka, Zookeeper, etc. ... Realistically, you need some JMX + System monitoring tool, and a bunch exist
... View more
08-28-2018
12:32 PM
@Jay we run the service check but it fail on python time out , is any other idea to increase the logs ? second from where we get the - yarn-yarn-resourcemanager-.... files ? they are not written in the log4j so I not understand how they create -rw-r--r-- 1 yarn hadoop 1847 Aug 27 12:03 yarn-yarn-resourcemanager-master02.sys76.com.out.1
-rw-r--r-- 1 yarn hadoop 1052 Aug 27 12:05 yarn-yarn-resourcemanager-master02.sys76.com.log.10
-rw-r--r-- 1 yarn hadoop 1180 Aug 27 12:05 yarn-yarn-resourcemanager-master02.sys76.com.log.9
... View more