Member since
01-09-2019
401
Posts
163
Kudos Received
80
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2086 | 06-21-2017 03:53 PM | |
3177 | 03-14-2017 01:24 PM | |
1994 | 01-25-2017 03:36 PM | |
3172 | 12-20-2016 06:19 PM | |
1594 | 12-14-2016 05:24 PM |
12-03-2015
06:56 PM
Default for this is 10. I have seen it at 128 in a large (over 1000 nodes) cluster and I think this is causing load issues. What is the recommended value for this and when should this be increased from default 10.
... View more
Labels:
- Labels:
-
Apache Hadoop
12-02-2015
07:19 PM
When I look at HDFS audit logs, I see hbase user from HBaseMaster node accessing hdfs files and the entry I see in audit log is with 'cmd=listStatus'. We regularly see about 3 million of them per hour and we have seen a hike of 6 million of them per hour which probably may have crashed NN. Any idea what HBaseMaster is doing here or if we can reduce any of this load on NN?
... View more
Labels:
- Labels:
-
Apache HBase
11-30-2015
07:56 PM
Thanks Steve. In our case, we are looking to set it at RM level, not necessarily even at app/AM level. So, AM fails for any reason, just don't retry AM on the same host, pick something else. Based on error, it might be good option to blacklist at RM level to not send further AMs there.
... View more
11-18-2015
12:44 AM
Thanks. Will this change help with all TCP/IP communication? Or will it only help with certain communication like mapreduce shuffle ?
... View more
11-17-2015
11:37 PM
1 Kudo
ipc.server.tcpnodelay has been changed to true by default in hadoop 2.6. We are on hadoop 2.4 and would like to change it to true. What services if any require a restart for this change? Can it be set at job level for all jobs and not restart services? With a big cluster where NN restart takes more than 60 minutes, we would like to avoid all possible restarts.
... View more
Labels:
- Labels:
-
Apache Hadoop
11-10-2015
06:10 PM
The question is more in the sense of how mapreduce AM can have a policy of not rerunning AM on the same node that failed on the first try. This is not a custom yarn app where we can decide where AM should go. If map reduce AM can't do this now, it might be better if we can drive a support ticket for it for an enhancement since with current approach, problem in single node manager can cause failed map reduce jobs.
... View more
11-09-2015
09:39 PM
It has nothing to do with labels. It will be the same issue with Node Labels. Whenever an AM fails for any reason, I see that the retry is happening on the same node. If first AM failed for any node related issue, the second one also will fail for the same issue. What we are looking at is if it is possible with any config change to make sure AM retry does not happen on the same node.
... View more
11-09-2015
07:35 PM
I have seen AM being retried on the same node where first attempt failed causing the job to fail. There are situations where there is something wrong with the node (either with space or other issues), so any number of retries there will fail. Is there any way to see that AM retries always go to a different Node Manager? Is the current policy to always retry on the same Node Manager?
... View more
Labels:
- Labels:
-
Apache YARN
11-04-2015
11:05 PM
2 Kudos
REGISTER /tmp/tez-tfile-parser-0.8.2-SNAPSHOT.jar;
yarnlogs = LOAD '/app-logs/hdfs/logs/**/*' USING org.apache.tez.tools.TFileLoader();
lines_with_fetchertime = FILTER yarnlogs BY $2 matches '.*freed by fetcher.*'; This was the code that I used to extract specific text in logs. However, TFileLoader in tez-tools does not seem to scale up that well when we pass a folder with ton on logs. tez-tools I believe is also not part of HDP. You need to build it separately. Worked well on smaller datasets and ran into issues on bigger datasets Thanks
... View more
11-04-2015
09:30 PM
Hortonworks documentation (http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_yarn_resource_mgt/content/enabling_cgroups.html) says using CGroups requires HDP running in secure mode with kerberos. However, there is no such requirement according to apache documentation. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html Apache documentation calls out for settings for running Cgroups without kerberos and settings required. Which documentation is incorrect? Do we support running LCE and CGroups without kerberos?
... View more
Labels:
- Labels:
-
Apache YARN