Member since
07-17-2019
738
Posts
433
Kudos Received
111
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3473 | 08-06-2019 07:09 PM | |
| 3669 | 07-19-2019 01:57 PM | |
| 5192 | 02-25-2019 04:47 PM | |
| 4663 | 10-11-2018 02:47 PM | |
| 1768 | 09-26-2018 02:49 PM |
07-22-2016
07:57 PM
3 Kudos
(assuming you're referring to mutable global indexes..) There is a direct relation to the amount of index data you have to create for every update your write to the data table. So, if you are indexing 10 columns for a data table, you're actually writing 11 updates for every one 1 update your clients writes. Only 100GB of data for a day on 16nodes seems like it would be reasonable to manage this, but you are pushing a lot of work to the RegionServers. I would make sure that the RegionServers are adequately sized to handle all of the extra load. Using immutable tables pushes this work to the client which might be more scalable a solution: https://phoenix.apache.org/secondary_indexing.html#Immutable_Tables
... View more
07-21-2016
07:51 PM
You got the same error message?
... View more
07-21-2016
06:51 PM
Great. Glad you got it working in the end. I'm not sure how resource localization works in Spark (can only compare it to how I know YARN works). The explanation behind those two different UserGroupInformation calls is that the one you invoked does not alter the static "current user" state inside UserGroupInformation and the JAAS login system. That is why you need the doAs() call. If you use loginUserFromKeytab() instead, you can remove the doAs and just interact with HBase normally.
... View more
07-21-2016
05:44 PM
That is not an error message, it's INFO. ZooKeeper is just telling you that a client tried to create the node /brokers/ids but it already existed in ZooKeeper (you cannot create a node that already exists). Not sure why you can't see the Kafka messages though.
... View more
07-21-2016
04:02 PM
4 Kudos
You should not rely on an external ticket cache for distributed jobs. The best solution is to ship a keytab with your application or rely on a keytab being deployed on all nodes where your Spark task may be executed. You likely want to replace: UserGroupInformation ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI("name@xyz.com", keyTab);
UserGroupInformation.setLoginUser(ugi); With: UserGroupInformation.loginUserFromKeytab("name@xyz.com", keyTab);
connection=ConnectionFactory.createConnection(conf); With your approach above, you would need to do something like the following after obtaining the UserGroupInformation instance: ugi.doAs(new PrivilegedAction<Void>() {
public Void run() {
connection = ConnectionFactory.createConnection(conf);
...
return null;
}
});
... View more
07-21-2016
03:04 PM
🙂 no worries. Just wanted to avoid misinformation. Calling PQS a "proxy server" is definitely the best phrase I can come up with. It uses protobuf to accomplish this, but users don't really have to be aware that's happening (so I tend to not mention it unless explaining how it works).
... View more
07-18-2016
06:32 PM
2 Kudos
2016-07-18 15:00:01,784 WARN [StandardProcessScheduler Thread-3] o.a.h.h.zookeeper.RecoverableZooKeeper Unable to create ZooKeeper Connection
java.net.UnknownHostException: ip-10-40-20-197.ec2.internal: unknown error
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) ~[na:1.8.0_77]
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928) ~[na:1.8.0_77]
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323) ~[na:1.8.0_77]
at java.net.InetAddress.getAllByName0(InetAddress.java:1276) ~[na:1.8.0_77]
at java.net.InetAddress.getAllByName(InetAddress.java:1192) ~[na:1.8.0_77]
at java.net.InetAddress.getAllByName(InetAddress.java:1126) ~[na:1.8.0_77]
at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61) ~[zookeeper-3.4.6.jar:3.4.6-1569965]
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445) ~[zookeeper-3.4.6.jar:3.4.6-1569965]
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380) ~[zookeeper-3.4.6.jar:3.4.6-1569965]
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.checkZk(RecoverableZooKeeper.java:141) [hbase-client-1.1.2.jar:1.1.2]
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:221) [hbase-client-1.1.2.jar:1.1.2]
at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:541) [hbase-client-1.1.2.jar:1.1.2]
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.checkIfBaseNodeAvailable(ConnectionManager.java:895) [hbase-client-1.1.2.jar:1.1.2]
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.access$400(ConnectionManager.java:545) [hbase-client-1.1.2.jar:1.1.2]
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(ConnectionManager.java:1483) [hbase-client-1.1.2.jar:1.1.2]
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1524) [hbase-client-1.1.2.jar:1.1.2]
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1553) [hbase-client-1.1.2.jar:1.1.2]
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1704) [hbase-client-1.1.2.jar:1.1.2]
at org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:38) [hbase-client-1.1.2.jar:1.1.2]
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:124) [hbase-client-1.1.2.jar:1.1.2]
at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3917) [hbase-client-1.1.2.jar:1.1.2]
at org.apache.hadoop.hbase.client.HBaseAdmin.listTableNames(HBaseAdmin.java:413) [hbase-client-1.1.2.jar:1.1.2]
at org.apache.hadoop.hbase.client.HBaseAdmin.listTableNames(HBaseAdmin.java:397) [hbase-client-1.1.2.jar:1.1.2]
at org.apache.nifi.hbase.HBase_1_1_2_ClientService.onEnabled(HBase_1_1_2_ClientService.java:181) [nifi-hbase_1_1_2-client-service-0.6.1.jar:0.6.1]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_77]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_77]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_77]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_77]
at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:137) [nifi-framework-core-0.6.1.jar:0.6.1]
at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:125) [nifi-framework-core-0.6.1.jar:0.6.1]
at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:70) [nifi-framework-core-0.6.1.jar:0.6.1]
at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotation(ReflectionUtils.java:47) [nifi-framework-core-0.6.1.jar:0.6.1]
at org.apache.nifi.controller.service.StandardControllerServiceNode$1.run(StandardControllerServiceNode.java:285) [nifi-framework-core-0.6.1.jar:0.6.1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_77]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_77]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_77]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_77]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_77]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_77]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
This looks like a DNS issue. The hostname "ip-10-40-20-197.ec2.internal" is not routable from the node which you are running NiFi on. Since this is on EC2, you likely need to configure HBase to use the external hostnames, not the internal names, if you intend to communicate with it from outside those hosts. I don't know what fancy things you can do with AWS to make this possible otherwise (maybe you can set up some private network between your EMR cluster and your EC2 nodes?). This document on how to set up multi-homing for Hadoop and HBase might be helpful for you (as that's essentially the environment that EC2 sets up for you by default): https://community.hortonworks.com/articles/24277/parameters-for-multi-homing.html
... View more
07-17-2016
09:31 PM
I would also say that you should be able to understand the data which you are loading to make sure that you are creating reasonable split points. Even if the keys are hashed, you should be able to understand what the first byte/character of the rowKey is and create reasonable split points (using RegionSplitter or by hand).
... View more
07-13-2016
02:36 PM
1 Kudo
@sunny malik, are you still using the org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy split policy? Look at hbase.regionserver.region.split.policy in hbase-site.xml.
Try using the ConstantSizeRegionSplitPolicy instead which will only split at 10GB. The IncreasingToUpperBoundRegionSplitPolicy will split more aggressively in the beginning, slowing down to larger regions as the number of regions for the table increases. You can find more information in the HBase Book
... View more