Member since
04-13-2016
36
Posts
4
Kudos Received
0
Solutions
06-29-2017
08:47 AM
Thank you @Josh Elser - This is in line with my understanding. It would be a fair statement to make that phoenix indexes are not like traditional indexes which derive main benefit from remaining in-memory. Phoenix indexes are just another Hbase table allowing an orthogonal lookup on the columns (by copying them).
... View more
06-28-2017
10:08 AM
If I create a secondary index on data table and include all the columns in it, then the index does not get picked up for any query. This is evident from explain plan of the query such as count(*) or an inner join query. Is that because phoenix optimizer figures that having all the columns in index does not add value over doing a self join with main table for the inner join query? In that sense, is it a bad practice to add all columns to secondary index? Just to state further, my data table has 2million rows.
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Phoenix
06-28-2017
08:08 AM
Thank you @wengelbrecht - But how does it manage all services who are using older keytab? Does it restart them?
... View more
06-28-2017
05:17 AM
Ambari creates keytabs internally because it has details of the AD it is connecting to. However, how does ambari regenerate keytabs once the passwords expire at AD end? How does it ensure that the services dependent on those keytabs do not go down? Or all services have to be shut down when a new keytab
is provisioned?
... View more
Labels:
- Labels:
-
Apache Ambari
02-10-2017
10:07 AM
Sure Josh, thanks a lot for taking time to look at these logs.
... View more
02-06-2017
06:50 AM
Hi @Josh Elser - I hope you are able to spot a problem in data that I uploaded. Is there any hint in that which corroborates with my assessment above or points to a different problem altogether? Any input is appreciated. Thanks,
Sumit
... View more
02-02-2017
11:03 AM
"GSS initiate failed" without any detailed error message is like a catch-all in the class com.sun.security.sasl.gsskerb.GssKrb5Server in the method evaluateResponse(byte[] responseData). There are many lines in that big try-catch that can throw this exception. So, after I debugged the running hbase master process through a debugger I am able to clearly see that the stack trace leading to this GSSException is actually coming from method acceptSecContext(InputStream is, int mechTokenSize) in the class Krb5Context. The actual exception is KrbException being thrown as GSSException. This points to issues in keytab in master node vs details in service ticket that region server is presenting, assuming JCE and encryption types on both nodes are appropriate.
... View more
02-01-2017
05:16 PM
@Josh Elser - I've uploaded the kerberos out logs from Hbase master and region servers - hbase-kerberos-logs.zip For security reasons, I have had to mask the host names, etc.
... View more
01-30-2017
03:47 AM
Hi @Josh Elser - I hope the logs I attached show some issue. I am not able to find anything amiss. Plus, with a simple socket server test program I notice that we are able to successfully get service ticket and also send data back and forth. So, I'd assume that from kerberos side the issue is not there. Then, something from hbase side is messing things up? Or maybe some permissions of user / user groups in HDFS, LDAP?
... View more
01-25-2017
06:08 AM
Hi @Josh Elser - I have attached 2 logs (after kerberos debug flag) for the HMaster and HRegion server. Overall logs are quite huge so I have tried to remove some portions. Like I removed the hex dump of tickets. However, do let me know if I need to attach full logs. Some comments in general:
The error "GSS initiate failed" shows up even when master and region server come up on same host. After enabling debug logs for HBase, HMaster shows:
2017-01-20 18:17:11,699 DEBUG [main-EventThread] zookeeper.RegionServerTracker: Added tracking of RS /srvuser/hbase/rs/a1.example.com,52412,1484889430172 2017-01-20 18:17:11,823 DEBUG [RpcServer.listener,port=42263] ipc.RpcServer: RpcServer.listener,port=42263: connection from 10.64.130.53:46270; # active connections: 1 2017-01-20 18:17:11,856 DEBUG [RpcServer.reader=2,bindAddress=a1.example.com,port=42263] ipc.RpcServer: Kerberos principal name is srvuser/a1.example.com@ADC.EXAMPLE.COM 2017-01-20 18:17:11,857 DEBUG [RpcServer.reader=2,bindAddress=a1.example.com,port=42263] ipc.RpcServer: Created SASL server with mechanism = GSSAPI 2017-01-20 18:17:11,857 DEBUG [RpcServer.reader=2,bindAddress=a1.example.com,port=42263] ipc.RpcServer: Have read input token of size 1824 for processing by saslServer.evaluateResponse() 2017-01-20 18:17:11,857 DEBUG [RpcServer.reader=2,bindAddress=a1.example.com,port=42263] ipc.RpcServer: RpcServer.listener,port=42263: Caught exception while reading:GSS initiate failed 2017-01-20 18:17:11,857 DEBUG [RpcServer.reader=2,bindAddress=a1.example.com,port=42263] ipc.RpcServer: RpcServer.listener,port=42263: DISCONNECTING client 10.64.130.53:46270 because read count=-1. Number of active connections: 1hbase-regionsvr-kerberos-output.txthmaster-kerberos-flag-output.txt
... View more
01-23-2017
09:26 AM
@Zhao Chaofeng - I am having exact problem with Hbase (1.1.2) - GSS Initiate failed even with a valid kerberos service ticket. Can you please let me know what version of kerberos libraries did you re-install? I am using Kerberos 5 version 1.10.3
Thanks!
Sumit
... View more
01-23-2017
03:19 AM
@Sergey Soldatov - Thanks for the suggestion. I had added that flag already and that only showed me that I have a valid service ticket (as mentioned above). The SecurityAuth.audit log of hbase master shows the following error: 2017-01-20 18:17:08,221 WARN SecurityLogger.org.apache.hadoop.hbase.Server: Auth failed for x.y.z.q:55872:null 2017-01-20 18:17:11,857 WARN SecurityLogger.org.apache.hadoop.hbase.Server: Auth failed for x.y.z.q:46270:null Not sure if this can point to any problems. The IP for which auth shows as failed above is where HRegion server is running.
... View more
01-22-2017
02:36 PM
I am launching hbase (1.1.2) on a kerberized cluster (AD). Hbase region server fails to connect to master with following error: 2017-01-20 18:17:23,944 WARN [regionserver/a1.example.com/xxxxx] regionserver.HRegionServer: error telling master we are up
com.google.protobuf.ServiceException: java.io.IOException: Couldn't setup connection for srvuser/a1.example.com@ADC.EXAMPLE.COM to srvuser/a2.example.com@ADC.EXAMPLE.COM
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:223)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8982)
at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2270) ... Caused by: org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS initiate failed
at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.readStatus(HBaseSaslRpcClient.java:153)
at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:189) I turned ON detailed debug logs for kerberos as well as Hbase. I can see that the service ticket is successfully obtained by host a1 for a2: Found ticket for srvuser/a1.example.com@ADC.EXAMPLE.COM to go to krbtgt/ADC.EXAMPLE.COM@ADC.EXAMPLE.COM expiring on Sat Jan 21 04:17:10 PST 2017 Found ticket for srvuser/a1.example.com@ADC.EXAMPLE.COM to go to srvuser/a2.example.com@ADC.EXAMPLE.COM expiring on Sat Jan 21 04:17:10 PST 2017 Client Principal = srvuser/a1.example.com@ADC.EXAMPLE.COM Server Principal = srvuser/a2.example.com@ADC.EXAMPLE.COM Session Key = EncryptionKey: keyType=23 keyBytes I do not see any errors post the above lines in detailed kerberos level logs so I assume that the problem of GSS Initiate failed has not anything to do with kerberos now else I would have seen some error reported (such as say ticket being corrupted?) I notice that GSS Initiate failed message without any details reported is specified by experts as one of the most useless messages - Steve's error messages to fear. Already verified unlimited JCE policy files are present, and that both hosts are using the same encryption algorithm. Can anyone help here? Even if it is about what next steps I can take to debug this? Thank you!
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache HBase
09-22-2016
04:21 AM
@Artem Ervits @Neeraj Sabharwal - I am trying to leverage size-based throttling but keep getting ThrottlingException when I start hbase, even when there is hardly any data in hbase. I am sure this is some mis-configuration from my end but I cannot seem to find that out. Any inputs would be appreciated. Just to also add there is some correlation here between number of pre-splits and throttling size limit because the error shows up only when number of pre-splits are more.
Details : Hbase version : 1.1.2, Number of region servers :4, Number of regions : 116, HeapMemory for Region Server : 2GB Quotas set : TABLE => ns1:table1 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, LIMIT => 10G/sec, SCOPE => MACHINE TABLE => ns2:table2 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, LIMIT => 10G/sec, SCOPE => MACHINE Region server stack trace (notice below that the error is about read size limit exceeded, and later the size of scan is only 28 (bytes?). Stack trace:- 2016-09-17 22:35:40,674 DEBUG [B.defaultRpcServer.handler=55,queue=1,port=58526] quotas.RegionServerQuotaManager: Throttling exception for user=root table=ns1:table1 numWrites=0 numReads=0 numScans=1: read size limit exceeded - wait 0.00sec 2016-09-17 22:35:40,676 DEBUG [B.defaultRpcServer.handler=55,queue=1,port=58526] ipc.RpcServer: B.defaultRpcServer.handler=55,queue=1,port=58526: callId: 52 service: ClientService methodName: Scan size: 28 connection: 10.65.141.170:42806
org.apache.hadoop.hbase.quotas.ThrottlingException: read size limit exceeded - wait 0.00sec
at org.apache.hadoop.hbase.quotas.ThrottlingException.throwThrottlingException(ThrottlingException.java:107)
... View more
08-03-2016
03:38 AM
Hi @billie - Thanks. Actually, I was able to get that part working (and yes, the changes are needed both in appConfig as well as metainfo). However, when there are more than 1 region servers started on the same host (different ports), then slider gives wrong info about the port of 1st region server. Other region server ports are correct. I think that should be a bug in slider.
... View more
07-31-2016
06:59 AM
I am using hbase 0.98 and slider 0.81.1 I want to be able to use slider REST APIs to get port number for region server instances deployed through slider. I assume the API to use is https://inldmqarh71n2:8090/proxy/application_1467115608017_0178/ws/v1/slider/publisher/exports/servers However, I get NullPointerException when I issue this API call. Do I need to specify anything specific in metaInfo.xml to make this work? Or is it slider version issue?
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Slider
-
Apache YARN
06-08-2016
10:26 AM
2 Kudos
I've been having following questions about SmartSense. Would anyone who has used it, be able to help? 1. We host hbase as YARN app and use slider for the same. I notice that SmartSense has a support for hbase monitoring/ troubleshooting. Just wanted to know if that is extensible to Hbase on Yarn too? 2. Does SmartSense help with piecing together troubleshooting information from so different logs? For example, YARN container app may be down because Yarn node manager went down, which in turn may be down because Yarn RM terminated all apps on that node manager. Piecing this information today requires looking into resource manager/ node manager logs along with Hbase logs. Another case is that of say, an app going down because ZooKeeper has hit the maxClientCnxns issues and would not allow any more incoming connections from that host. Those are just representative set of problems. Does SmartSense help there? 3. Does SmartSense also help identify issues such as Kerberos ticket renewal issue, SSL issues, open file handles issues? Thanks, Sumit
... View more
Labels:
- Labels:
-
Apache HBase
-
Hortonworks SmartSense
05-03-2016
04:18 AM
@billie - Thank you for the info. So, it is exactly as I thought. And in my opinion ps is completely wrong in the context of hbase because even with ps coming back successfully, the region server is dead for all practical purposes. Unfortunately, because of this my idea of reducing heartbeat.monitor.interval will also not make too much difference because ps will be fine.
... View more
05-02-2016
02:52 PM
@Devaraj Das - Is there any way that you are aware through which I can find the mechanism used by slider to heartbeat the container? I am being told that it can take up to 15-20 minutes to get back the container.
... View more
05-02-2016
02:49 PM
Ok, I figured there are setting which can control whether we want block cache invalidation when major compaction happens. In my case that setting is disabled, however.
... View more
05-02-2016
02:44 PM
Hey @rmaruthiyodan - Thanks. Yes, I had to use /proc to find region server PID specific limits. Basically, ambari restricts this number to 32K by default and this can be overridden in blueprint being submitted.
... View more
04-30-2016
12:51 PM
@nmaillard - Thanks. Yes, I am aware of lsof and was planning to use it. Also could there be a setting in hbase which restricts number of open file handles in hbase itself and throws this error? Also, you meant /proc/sys/fs/file-max? Thanks
... View more
04-30-2016
07:13 AM
I have 3 region servers and their total size on HDFS is ~50G only. I have ulimit set to unlimited and for the hbase user also the value is very high (32K +). I am noticing following in my logs very often after which I start getting HFile corruption exceptions: 2016-04-27 16:44:46,845 WARN
[StoreFileOpenerThread-g-1] hdfs.DFSClient: Failed to connect to
/10.45.0.51:50010 for block, add to deadNodes and continue. java.net.SocketException:
Too many open files
java.net.SocketException: Too many open files
at
sun.nio.ch.Net.socket0(Native Method) After many of these open files issues, I get a barrage of HFile corrupt issues too and hbase fails to come up: 2016-04-27 16:44:46,313 ERROR
[RS_OPEN_REGION-secas01aplpd:44461-1] handler.OpenRegionHandler: Failed open of
region=lm:DS_326_A_stage,\x7F\xFF\xFF\xF8,1460147940285.1a764b8679b8565c5d6d63e349212cbf.,
starting to roll back the global memstore size. java.io.IOException: java.io.IOException:
org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile
Trailer from file hdfs://mycluster/MA/hbase/data/lm/DS_326_A_stage/1a764b8679b8565c5d6d63e349212cbf/e/63083720d739491eb97544e16969ffc7 at
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:836)
at
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:747)
at
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:718) My questions are two: 1. No other process on this node shows up too many open files issue. Even data node seems to not show this error in logs. Not sure, why then this error should be reported. 2. Would an OfflineMetaRepair following by hbck -fixMeta and hbck -fixAssignments solve the issue?
... View more
Labels:
- Labels:
-
Apache HBase
04-30-2016
07:02 AM
@Laurent Edel - Thanks, I did not think about the fact that splitting does not always create two 10G regions. I am using hbase 0.98. So, if I were to set ConstantSizeRegionSplitPolicy through hbase shell, then I can assume them to always be 10G in size?
... View more
04-30-2016
07:00 AM
@Enis - I have salted reowkeys so am hopeful that the region servers should not hotspot.
... View more
04-29-2016
05:04 PM
1 Kudo
I notice following line in my region server logs -
2016-04-27 12:11:11,924 WARN
[MemStoreFlusher.1] regionserver.CompactSplitThread: Total number of
regions is approaching the upper limit 1000. Please consider taking a look at
http://hbase.apache.org/book.html#ops.regionmgt And also - 2016-04-27 16:31:47,799 INFO
[regionserver54130] regionserver.HRegionServer: Waiting on 4007 regions to close This is surprising because I do not have as much data. Given the default value of hbase.hregion.max.filesize is 10G, this would imply 40TB of data. That is not even the size of my disks put together. Does this mean there are many empty regions getting created? If so, why? Is there any performance implication to carrying these empty regions around? Definitely, one of them is that so many file descriptors are used up? Can I get rid of them?
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Hive
04-22-2016
02:39 PM
Another question is where I can specify a value for heartbeat.monitor.interval?
... View more
04-22-2016
02:37 PM
@Devaraj Das - So, I managed to take a look at slider classes. I see it uses some heartbeat mechanism. Would you be aware of what does the agent use for heartbeat? Is it a simple 'ps' to figure out if the process is alive. Why I am trying to understand that is because if I know it is as simple as 'ps', I can likely add another script which can 'watch' the znode for this region server and shut it down locally. Which would then lead to slider AM relaunching another container. I see another option to salvage some of these containers faster by looking closely at some of these slider classes HeartbeatMonitor and AgentProviderService. The default sleep time of monitoring thread is 60sec. I see this can be controlled through heartbeat.monitor.interval property in AgentKey class. The logic is such that if 2 consecutive monitoring intervals miss a heartbeat then the container is marked as DEAD. Now, my zookeeper timeout is 40 sec. This means region server is marked dead when 40sec are over. However, agent considers it fine until 2*60 = 120 sec. So, one thing I see I need to do is make 2*heartbeat.monitor.interval = zookeeper session timeout value. Of course, if even then heartbeat is received then this logic can't help.
... View more
04-19-2016
05:53 PM
I use apache slider for launching hbase containers. Is there a setting which controls how long it takes for slider to consider region server as dead? It takes region server some time to shutdown even when HMaster marks a region server as dead. This could be due to a GC pause it is dealing with. However, slider will not launch a new container/ region server unless this container is not given up by existing region server which is hung/ already marked dead by master. In such a case, the wait time to launch a new region server instance can be arbitrarily long. How does slider monitor health of region server? Is there a way to make it sync with HMaster in deciding if region server is dead?
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Slider
-
Apache YARN
04-14-2016
04:10 PM
Ok, I was not aware that major compaction would invalidate block cache. Not sure why that should be so, though. Any link where I can read more on this?
... View more