Member since
07-27-2015
35
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
22101 | 04-06-2018 01:05 AM |
04-06-2018
01:05 AM
Hi, This does not seem to have worked with a latter version of CDH (5.13.1). There we had to set this through - YARN Client Advanced Configuration Snippet (Safety Valve) for yarn-site.xml So, what is the correct way to set this? Is this really changed with newer releases? Thanks, Sumit
... View more
02-06-2017
07:33 PM
@samurai - Yes, there were 2 main issues. One, was that these were VMs and another was that zookeeper was collocated with another service which shared the same disk.
... View more
04-07-2016
03:05 AM
No, those are 2 different properties. One for vmem and the other one for pmem.
... View more
12-02-2015
07:45 PM
Thanks Harsh, So, to generalize, the mechanism level subcodes can always be taken as some failure in communicating with KDC, right? I also see that despite this error, ZK does continue to function ... so is this error to be really treated seriously? Thanks again.
... View more
11-23-2015
01:01 AM
I use a kerberized cluster and once in a while I notice following error in my zookeeper client logs: 15/11/15 15:46:53 ERROR client.ZooKeeperSaslClient: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Connection reset)]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state. 15/11/15 15:46:53 ERROR zookeeper.ClientCnxn: SASL authentication with Zookeeper Quorum member failed: javax.security.sasl.SaslException: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Connection reset)]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state. So, I had following doubt with this: It is showing actual error to be connection reset. I am not sure connection RST to what? Is it to Kerberos KDC? But the log further seems to indicate that connection issue happened when connecting to ZK quorum member. So, in that case the RST flag is recd from ZK quorum member? Thanks, Sumit
... View more
Labels:
- Labels:
-
Apache Zookeeper
10-12-2015
09:27 PM
I am trying to enable HA for Resource Mgr as well NameNode. However, very often the masters failover to standby. There is no issue with HA as such, but every failover ends up exhausting one application attempt. I notice following issues: A series of slow fsync followed (sometimes only) by CancelledKeyException. 2015-10-12 17:22:41,000 - WARN [SyncThread:3:FileTxnLog@334] - fsync-ing the write ahead log in SyncThread:3 took 6943ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
2015-10-12 17:22:41,001 - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x1505bcdb3e3054e
2015-10-12 17:22:41,002 - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x1505bcdb3e30003 type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:null Error:KeeperErrorCode = Session moved
2015-10-12 17:22:41,004 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /10.65.144.35:36030 which had sessionid 0x1505bcdb3e30003
2015-10-12 17:22:41,006 - ERROR [CommitProcessor:3:NIOServerCnxn@178] - Unexpected Exception:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081)
at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404)
at org.apache.zookeeper.server.quorum.Leader$ToBeAppliedRequestProcessor.processRequest(Leader.java:644)
at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74) The time taken is some time as high as 10sec. This could surely timeout the clients, I suppose leading to deletion of ephemeral nodes that masters created. Around this time, the masters switch over. I have seen that disk space is not a concern. However, at times the await time to ZK dataDir drive does show a surge. I also confirmed that GC pauses are minimal. Any pointers would be really appreciated.
... View more
Labels:
- Labels:
-
Apache Zookeeper
08-25-2015
08:08 AM
I notice that memory on another CDH host is overcommitted. Of the total 62GB phy memory, the memory allotted to YARN containers is 40GB and that consumes the major chunk. I notice that my apps remain in ACCEPTED state despite requesting for only 1GB for each container. 1. Is this 40GB reserved by YARN at startup? 2. Plus, I assume that does not contribute to apps remaining in ACCEPTED state because it is from this pool of 40GB that my apps should get memory allocated, right? Thanks, Sumit
... View more
08-20-2015
08:22 PM
So, the only thing which is not clear to me now is why was I allowed only 2 max apps per user when it was not set anywhere explicitly? Is it possible to infer that from data being given above? Or what other data point(s) influenced limiting it to 2 apps per user? Thanks, Sumit
... View more
08-20-2015
07:27 AM
Thank you Wilfred. Where do I check "Max Application Master Share"? Plus, does Fair Scheduler take independent disks into account when calculating number of containers possible?
... View more
08-13-2015
12:10 AM
Ok, figured one thing that Num Containers implies total running containers. I have 2 independent physical disks on both nodes.
... View more