About elserj

elserj · ‎07-22-2016

(assuming you're referring to mutable global indexes..) There is a direct relation to the amount of index data you have to create for every update your write to the data table. So, if you are indexing 10 columns for a data table, you're actually writing 11 updates for every one 1 update your clients writes. Only 100GB of data for a day on 16nodes seems like it would be reasonable to manage this, but you are pushing a lot of work to the RegionServers. I would make sure that the RegionServers are adequately sized to handle all of the extra load. Using immutable tables pushes this work to the client which might be more scalable a solution: https://phoenix.apache.org/secondary_indexing.html#Immutable_Tables

elserj · ‎07-21-2016

You got the same error message?

elserj · ‎07-21-2016

Great. Glad you got it working in the end. I'm not sure how resource localization works in Spark (can only compare it to how I know YARN works). The explanation behind those two different UserGroupInformation calls is that the one you invoked does not alter the static "current user" state inside UserGroupInformation and the JAAS login system. That is why you need the doAs() call. If you use loginUserFromKeytab() instead, you can remove the doAs and just interact with HBase normally.

elserj · ‎07-21-2016

That is not an error message, it's INFO. ZooKeeper is just telling you that a client tried to create the node /brokers/ids but it already existed in ZooKeeper (you cannot create a node that already exists). Not sure why you can't see the Kafka messages though.

elserj · ‎07-21-2016

You should not rely on an external ticket cache for distributed jobs. The best solution is to ship a keytab with your application or rely on a keytab being deployed on all nodes where your Spark task may be executed. You likely want to replace: UserGroupInformation ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI("name@xyz.com", keyTab); UserGroupInformation.setLoginUser(ugi); With: UserGroupInformation.loginUserFromKeytab("name@xyz.com", keyTab); connection=ConnectionFactory.createConnection(conf); With your approach above, you would need to do something like the following after obtaining the UserGroupInformation instance: ugi.doAs(new PrivilegedAction<Void>() { public Void run() { connection = ConnectionFactory.createConnection(conf); ... return null; } });

elserj · ‎07-21-2016

🙂 no worries. Just wanted to avoid misinformation. Calling PQS a "proxy server" is definitely the best phrase I can come up with. It uses protobuf to accomplish this, but users don't really have to be aware that's happening (so I tend to not mention it unless explaining how it works).

elserj · ‎07-20-2016

Doesn't look like there is any way to accept a regex of tables.

elserj · ‎07-18-2016

2016-07-18 15:00:01,784 WARN [StandardProcessScheduler Thread-3] o.a.h.h.zookeeper.RecoverableZooKeeper Unable to create ZooKeeper Connection java.net.UnknownHostException: ip-10-40-20-197.ec2.internal: unknown error at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) ~[na:1.8.0_77] at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928) ~[na:1.8.0_77] at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323) ~[na:1.8.0_77] at java.net.InetAddress.getAllByName0(InetAddress.java:1276) ~[na:1.8.0_77] at java.net.InetAddress.getAllByName(InetAddress.java:1192) ~[na:1.8.0_77] at java.net.InetAddress.getAllByName(InetAddress.java:1126) ~[na:1.8.0_77] at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61) ~[zookeeper-3.4.6.jar:3.4.6-1569965] at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445) ~[zookeeper-3.4.6.jar:3.4.6-1569965] at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380) ~[zookeeper-3.4.6.jar:3.4.6-1569965] at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.checkZk(RecoverableZooKeeper.java:141) [hbase-client-1.1.2.jar:1.1.2] at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:221) [hbase-client-1.1.2.jar:1.1.2] at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:541) [hbase-client-1.1.2.jar:1.1.2] at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.checkIfBaseNodeAvailable(ConnectionManager.java:895) [hbase-client-1.1.2.jar:1.1.2] at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.access$400(ConnectionManager.java:545) [hbase-client-1.1.2.jar:1.1.2] at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(ConnectionManager.java:1483) [hbase-client-1.1.2.jar:1.1.2] at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1524) [hbase-client-1.1.2.jar:1.1.2] at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1553) [hbase-client-1.1.2.jar:1.1.2] at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1704) [hbase-client-1.1.2.jar:1.1.2] at org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:38) [hbase-client-1.1.2.jar:1.1.2] at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:124) [hbase-client-1.1.2.jar:1.1.2] at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3917) [hbase-client-1.1.2.jar:1.1.2] at org.apache.hadoop.hbase.client.HBaseAdmin.listTableNames(HBaseAdmin.java:413) [hbase-client-1.1.2.jar:1.1.2] at org.apache.hadoop.hbase.client.HBaseAdmin.listTableNames(HBaseAdmin.java:397) [hbase-client-1.1.2.jar:1.1.2] at org.apache.nifi.hbase.HBase_1_1_2_ClientService.onEnabled(HBase_1_1_2_ClientService.java:181) [nifi-hbase_1_1_2-client-service-0.6.1.jar:0.6.1] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_77] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_77] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_77] at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_77] at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:137) [nifi-framework-core-0.6.1.jar:0.6.1] at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:125) [nifi-framework-core-0.6.1.jar:0.6.1] at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:70) [nifi-framework-core-0.6.1.jar:0.6.1] at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotation(ReflectionUtils.java:47) [nifi-framework-core-0.6.1.jar:0.6.1] at org.apache.nifi.controller.service.StandardControllerServiceNode$1.run(StandardControllerServiceNode.java:285) [nifi-framework-core-0.6.1.jar:0.6.1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_77] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_77] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_77] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_77] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_77] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_77] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77] This looks like a DNS issue. The hostname "ip-10-40-20-197.ec2.internal" is not routable from the node which you are running NiFi on. Since this is on EC2, you likely need to configure HBase to use the external hostnames, not the internal names, if you intend to communicate with it from outside those hosts. I don't know what fancy things you can do with AWS to make this possible otherwise (maybe you can set up some private network between your EMR cluster and your EC2 nodes?). This document on how to set up multi-homing for Hadoop and HBase might be helpful for you (as that's essentially the environment that EC2 sets up for you by default): https://community.hortonworks.com/articles/24277/parameters-for-multi-homing.html

elserj · ‎07-17-2016

I would also say that you should be able to understand the data which you are loading to make sure that you are creating reasonable split points. Even if the keys are hashed, you should be able to understand what the first byte/character of the rowKey is and create reasonable split points (using RegionSplitter or by hand).

elserj · ‎07-13-2016

@sunny malik, are you still using the org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy split policy? Look at hbase.regionserver.region.split.policy in hbase-site.xml. Try using the ConstantSizeRegionSplitPolicy instead which will only split at 10GB. The IncreasingToUpperBoundRegionSplitPolicy will split more aggressively in the beginning, slowing down to larger regions as the number of regions for the table increases. You can find more information in the HBase Book

Online	Offline
Last Visited	‎07-01-2022 02:44 PM

Member Since	‎07-17-2019 08:58 AM
Last Visited	‎07-01-2022 02:44 PM
Posts	738
Kudos received	429

Cloudera Community

Re: Why can't Object Stores like Amazon S3 be used...

Re: Not a host:port pair: PBUF, how to resolve?

Re: versioning question in hbase

Re: Phoenix query call from java on larger data se...

Re: Revoke permissions to a superuser on Hbase

Re: Is there a limitation on a number of secondary...

Re: Spark can't connect to HBase using Kerberos i...

Re: Spark can't connect to HBase using Kerberos i...

Re: How configure kafka and zookeeper for producti...

Re: Spark can't connect to HBase using Kerberos i...

Re: Which gives better performance for both writes...

Re: Run hbase major compaction on all tables

Re: Connect a Nifi service which lives on an EC2 i...

Re: Pre-splitting Hbase table not working

Re: Number of regions increases after major compac...