About sumit_nigam

sumit_nigam · ‎01-22-2017

I am launching hbase (1.1.2) on a kerberized cluster (AD). Hbase region server fails to connect to master with following error: 2017-01-20 18:17:23,944 WARN [regionserver/a1.example.com/xxxxx] regionserver.HRegionServer: error telling master we are up com.google.protobuf.ServiceException: java.io.IOException: Couldn't setup connection for srvuser/a1.example.com@ADC.EXAMPLE.COM to srvuser/a2.example.com@ADC.EXAMPLE.COM at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:223) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287) at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8982) at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2270) ... Caused by: org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS initiate failed at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.readStatus(HBaseSaslRpcClient.java:153) at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:189) I turned ON detailed debug logs for kerberos as well as Hbase. I can see that the service ticket is successfully obtained by host a1 for a2: Found ticket for srvuser/a1.example.com@ADC.EXAMPLE.COM to go to krbtgt/ADC.EXAMPLE.COM@ADC.EXAMPLE.COM expiring on Sat Jan 21 04:17:10 PST 2017 Found ticket for srvuser/a1.example.com@ADC.EXAMPLE.COM to go to srvuser/a2.example.com@ADC.EXAMPLE.COM expiring on Sat Jan 21 04:17:10 PST 2017 Client Principal = srvuser/a1.example.com@ADC.EXAMPLE.COM Server Principal = srvuser/a2.example.com@ADC.EXAMPLE.COM Session Key = EncryptionKey: keyType=23 keyBytes I do not see any errors post the above lines in detailed kerberos level logs so I assume that the problem of GSS Initiate failed has not anything to do with kerberos now else I would have seen some error reported (such as say ticket being corrupted?) I notice that GSS Initiate failed message without any details reported is specified by experts as one of the most useless messages - Steve's error messages to fear. Already verified unlimited JCE policy files are present, and that both hosts are using the same encryption algorithm. Can anyone help here? Even if it is about what next steps I can take to debug this? Thank you!

sumit_nigam · ‎09-22-2016

@Artem Ervits @Neeraj Sabharwal - I am trying to leverage size-based throttling but keep getting ThrottlingException when I start hbase, even when there is hardly any data in hbase. I am sure this is some mis-configuration from my end but I cannot seem to find that out. Any inputs would be appreciated. Just to also add there is some correlation here between number of pre-splits and throttling size limit because the error shows up only when number of pre-splits are more. Details : Hbase version : 1.1.2, Number of region servers :4, Number of regions : 116, HeapMemory for Region Server : 2GB Quotas set : TABLE => ns1:table1 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, LIMIT => 10G/sec, SCOPE => MACHINE TABLE => ns2:table2 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, LIMIT => 10G/sec, SCOPE => MACHINE Region server stack trace (notice below that the error is about read size limit exceeded, and later the size of scan is only 28 (bytes?). Stack trace:- 2016-09-17 22:35:40,674 DEBUG [B.defaultRpcServer.handler=55,queue=1,port=58526] quotas.RegionServerQuotaManager: Throttling exception for user=root table=ns1:table1 numWrites=0 numReads=0 numScans=1: read size limit exceeded - wait 0.00sec 2016-09-17 22:35:40,676 DEBUG [B.defaultRpcServer.handler=55,queue=1,port=58526] ipc.RpcServer: B.defaultRpcServer.handler=55,queue=1,port=58526: callId: 52 service: ClientService methodName: Scan size: 28 connection: 10.65.141.170:42806 org.apache.hadoop.hbase.quotas.ThrottlingException: read size limit exceeded - wait 0.00sec at org.apache.hadoop.hbase.quotas.ThrottlingException.throwThrottlingException(ThrottlingException.java:107)

sumit_nigam · ‎08-03-2016

Hi @billie - Thanks. Actually, I was able to get that part working (and yes, the changes are needed both in appConfig as well as metainfo). However, when there are more than 1 region servers started on the same host (different ports), then slider gives wrong info about the port of 1st region server. Other region server ports are correct. I think that should be a bug in slider.

sumit_nigam · ‎07-31-2016

I am using hbase 0.98 and slider 0.81.1 I want to be able to use slider REST APIs to get port number for region server instances deployed through slider. I assume the API to use is https://inldmqarh71n2:8090/proxy/application_1467115608017_0178/ws/v1/slider/publisher/exports/servers However, I get NullPointerException when I issue this API call. Do I need to specify anything specific in metaInfo.xml to make this work? Or is it slider version issue?

sumit_nigam · ‎06-08-2016

I've been having following questions about SmartSense. Would anyone who has used it, be able to help? 1. We host hbase as YARN app and use slider for the same. I notice that SmartSense has a support for hbase monitoring/ troubleshooting. Just wanted to know if that is extensible to Hbase on Yarn too? 2. Does SmartSense help with piecing together troubleshooting information from so different logs? For example, YARN container app may be down because Yarn node manager went down, which in turn may be down because Yarn RM terminated all apps on that node manager. Piecing this information today requires looking into resource manager/ node manager logs along with Hbase logs. Another case is that of say, an app going down because ZooKeeper has hit the maxClientCnxns issues and would not allow any more incoming connections from that host. Those are just representative set of problems. Does SmartSense help there? 3. Does SmartSense also help identify issues such as Kerberos ticket renewal issue, SSL issues, open file handles issues? Thanks, Sumit

sumit_nigam · ‎05-03-2016

@billie - Thank you for the info. So, it is exactly as I thought. And in my opinion ps is completely wrong in the context of hbase because even with ps coming back successfully, the region server is dead for all practical purposes. Unfortunately, because of this my idea of reducing heartbeat.monitor.interval will also not make too much difference because ps will be fine.

sumit_nigam · ‎05-02-2016

@Devaraj Das - Is there any way that you are aware through which I can find the mechanism used by slider to heartbeat the container? I am being told that it can take up to 15-20 minutes to get back the container.

sumit_nigam · ‎05-02-2016

Ok, I figured there are setting which can control whether we want block cache invalidation when major compaction happens. In my case that setting is disabled, however.

sumit_nigam · ‎05-02-2016

Hey @rmaruthiyodan - Thanks. Yes, I had to use /proc to find region server PID specific limits. Basically, ambari restricts this number to 32K by default and this can be overridden in blueprint being submitted.

sumit_nigam · ‎04-30-2016

@nmaillard - Thanks. Yes, I am aware of lsof and was planning to use it. Also could there be a setting in hbase which restricts number of open file handles in hbase itself and throws this error? Also, you meant /proc/sys/fs/file-max? Thanks

Online	Offline
Last Visited	‎06-29-2017 02:18 PM

Member Since	‎04-13-2016 06:51 AM
Last Visited	‎06-29-2017 02:18 PM
Posts	36
Kudos received	4

Cloudera Community

GSS Initiate failed even with a valid kerberos ser...

Re: Limit ressource allocate to HBase query based ...

Re: Query region server port through slider REST A...

Query region server port through slider REST API

SmartSense based troubleshooting for YARN containe...

Re: Apache slider and Hbase timeout setting

Re: Apache slider and Hbase timeout setting

Re: Do you increase zookeeper max session timeout ...

Re: Too many open files in region server logs

Re: Too many open files in region server logs