About sumit.nigam

sumit.nigam · ‎04-06-2018

Hi, This does not seem to have worked with a latter version of CDH (5.13.1). There we had to set this through - YARN Client Advanced Configuration Snippet (Safety Valve) for yarn-site.xml So, what is the correct way to set this? Is this really changed with newer releases? Thanks, Sumit

sumit.nigam · ‎02-06-2017

@samurai - Yes, there were 2 main issues. One, was that these were VMs and another was that zookeeper was collocated with another service which shared the same disk.

sumit.nigam · ‎04-07-2016

No, those are 2 different properties. One for vmem and the other one for pmem.

sumit.nigam · ‎12-02-2015

Thanks Harsh, So, to generalize, the mechanism level subcodes can always be taken as some failure in communicating with KDC, right? I also see that despite this error, ZK does continue to function ... so is this error to be really treated seriously? Thanks again.

sumit.nigam · ‎11-23-2015

I use a kerberized cluster and once in a while I notice following error in my zookeeper client logs: 15/11/15 15:46:53 ERROR client.ZooKeeperSaslClient: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Connection reset)]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state. 15/11/15 15:46:53 ERROR zookeeper.ClientCnxn: SASL authentication with Zookeeper Quorum member failed: javax.security.sasl.SaslException: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Connection reset)]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state. So, I had following doubt with this: It is showing actual error to be connection reset. I am not sure connection RST to what? Is it to Kerberos KDC? But the log further seems to indicate that connection issue happened when connecting to ZK quorum member. So, in that case the RST flag is recd from ZK quorum member? Thanks, Sumit

sumit.nigam · ‎10-12-2015

I am trying to enable HA for Resource Mgr as well NameNode. However, very often the masters failover to standby. There is no issue with HA as such, but every failover ends up exhausting one application attempt. I notice following issues: A series of slow fsync followed (sometimes only) by CancelledKeyException. 2015-10-12 17:22:41,000 - WARN [SyncThread:3:FileTxnLog@334] - fsync-ing the write ahead log in SyncThread:3 took 6943ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide 2015-10-12 17:22:41,001 - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x1505bcdb3e3054e 2015-10-12 17:22:41,002 - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x1505bcdb3e30003 type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:null Error:KeeperErrorCode = Session moved 2015-10-12 17:22:41,004 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /10.65.144.35:36030 which had sessionid 0x1505bcdb3e30003 2015-10-12 17:22:41,006 - ERROR [CommitProcessor:3:NIOServerCnxn@178] - Unexpected Exception: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404) at org.apache.zookeeper.server.quorum.Leader$ToBeAppliedRequestProcessor.processRequest(Leader.java:644) at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74) The time taken is some time as high as 10sec. This could surely timeout the clients, I suppose leading to deletion of ephemeral nodes that masters created. Around this time, the masters switch over. I have seen that disk space is not a concern. However, at times the await time to ZK dataDir drive does show a surge. I also confirmed that GC pauses are minimal. Any pointers would be really appreciated.

sumit.nigam · ‎08-25-2015

I notice that memory on another CDH host is overcommitted. Of the total 62GB phy memory, the memory allotted to YARN containers is 40GB and that consumes the major chunk. I notice that my apps remain in ACCEPTED state despite requesting for only 1GB for each container. 1. Is this 40GB reserved by YARN at startup? 2. Plus, I assume that does not contribute to apps remaining in ACCEPTED state because it is from this pool of 40GB that my apps should get memory allocated, right? Thanks, Sumit

sumit.nigam · ‎08-20-2015

So, the only thing which is not clear to me now is why was I allowed only 2 max apps per user when it was not set anywhere explicitly? Is it possible to infer that from data being given above? Or what other data point(s) influenced limiting it to 2 apps per user? Thanks, Sumit

sumit.nigam · ‎08-20-2015

Thank you Wilfred. Where do I check "Max Application Master Share"? Plus, does Fair Scheduler take independent disks into account when calculating number of containers possible?

sumit.nigam · ‎08-13-2015

Ok, figured one thing that Num Containers implies total running containers. I have 2 independent physical disks on both nodes.

Online	Offline
Last Visited	‎04-06-2018 02:29 AM

Member Since	‎07-27-2015 09:45 PM
Last Visited	‎04-06-2018 02:29 AM
Posts	35
Kudos received	2

Cloudera Community

Re: How to set yarn.nodemanager.pmem-check-enabled...

Re: How to set yarn.nodemanager.pmem-check-enabled...

Re: Zookeeper slow fsync followed by CancelledKeyE...

Re: How to set yarn.nodemanager.pmem-check-enabled...

Re: Zookeeper kerberos issue or quorum issue?

Zookeeper kerberos issue or quorum issue?

Zookeeper slow fsync followed by CancelledKeyExcep...

Re: Max apps per user.

Re: Max apps per user.

Re: Max apps per user.

Re: Max apps per user.