Member since
04-30-2018
4
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
10184 | 07-05-2018 06:28 PM |
07-05-2018
06:28 PM
1 Kudo
Short Answer: Turn off scatter gather Long Version: The data transfer b/n container and shuffle service happens through RPC Calls(ChunkFetchRequest, ChunkFetchSuccess and ChunkFetchFailure) On further debugging with trace level logs, we found that RPC calls were indeed happening b/n the container and the shuffle service and after some time the RPC call's were abruptly suppressed(meaning no more RPC calls were logged) from both shuffle service and container. On looking into kernel and system activity logs we found the following xen_netfront: xennet: skb rides the rocket: 19 slots That means that our ec2 machines were having network packet loss. More info on this log can be found in the following thread http://www.brendangregg.com/blog/2014-09-11/perf-kernel-line-tracing.html So we tried turning off the scatter-gather using the following command. sudo ethtool -K eth0 sg off The error was gone after that.
... View more
05-03-2018
10:24 AM
I didn't notice that you were only setting YARN_RESOURCEMANAGER_OPTS. This env variable is used for only the resourcemanger daemon. So to specify the opts for all hadoop and yarn client commands, you can use HADOOP_CLIENT_OPTS in . hadoop-env.sh . export HADOOP_CLIENT_OPTS="-Dyarn.resourcemanager.hostname=192.168.33.33" But I am not sure why you would need to this when you can just set it in the yarn-site.xml, which is what is recommended.
... View more