Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Connect a Nifi service which lives on an EC2 instance to HBase which lives on a separate EMR

avatar
New Contributor

We are trying to connect an instance of NiFi which is living on an EC2 instance to an HBase database which is living on an EMR cluster. We are having issues figuring out how to point the HBase_1_1_2_ClientService to point to the HBase config file which lives on a different machine (The EMR Cluster)

1 ACCEPTED SOLUTION

avatar
Master Mentor

You could also modify the local /etc/hosts file on your ec2 instances so that the hostname "ip-10-40-197.ec2.internal" resolves to the proper external IP addresses for those zk nodes if they have them.

View solution in original post

7 REPLIES 7

avatar
Master Guru

Usually if you have NiFi running on a node that is not part of the HDFS/HBase cluster, you copy the appropriate config files (hbase-site.xml and core-site.xml) to the NiFi node.

avatar
New Contributor

I did try that, but I am still getting this error on the HBase_1_1_2_Client Service. It is also stuck in the Enabling status

5834-capture.png

avatar
Master Guru

Are you sure that NiFi can reach all the services in EMR (HBase, ZK, etc)?

Also, can you look in nifi_home/logs/nifi-app.log and see if there is a full strack-trace that goes with that error. If so it would be helpful to see that, thanks.

avatar
New Contributor

That's actually what I am starting to think the issue is. How would I make sure that NiFi can reach the EMR services?

nifi-app.zipAlso Log is attached

avatar
Super Guru
2016-07-18 15:00:01,784 WARN [StandardProcessScheduler Thread-3] o.a.h.h.zookeeper.RecoverableZooKeeper Unable to create ZooKeeper Connection
java.net.UnknownHostException: ip-10-40-20-197.ec2.internal: unknown error
    at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) ~[na:1.8.0_77]
    at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928) ~[na:1.8.0_77]
    at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323) ~[na:1.8.0_77]
    at java.net.InetAddress.getAllByName0(InetAddress.java:1276) ~[na:1.8.0_77]
    at java.net.InetAddress.getAllByName(InetAddress.java:1192) ~[na:1.8.0_77]
    at java.net.InetAddress.getAllByName(InetAddress.java:1126) ~[na:1.8.0_77]
    at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61) ~[zookeeper-3.4.6.jar:3.4.6-1569965]
    at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445) ~[zookeeper-3.4.6.jar:3.4.6-1569965]
    at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380) ~[zookeeper-3.4.6.jar:3.4.6-1569965]
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.checkZk(RecoverableZooKeeper.java:141) [hbase-client-1.1.2.jar:1.1.2]
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:221) [hbase-client-1.1.2.jar:1.1.2]
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:541) [hbase-client-1.1.2.jar:1.1.2]
    at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.checkIfBaseNodeAvailable(ConnectionManager.java:895) [hbase-client-1.1.2.jar:1.1.2]
    at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.access$400(ConnectionManager.java:545) [hbase-client-1.1.2.jar:1.1.2]
    at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(ConnectionManager.java:1483) [hbase-client-1.1.2.jar:1.1.2]
    at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1524) [hbase-client-1.1.2.jar:1.1.2]
    at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1553) [hbase-client-1.1.2.jar:1.1.2]
    at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1704) [hbase-client-1.1.2.jar:1.1.2]
    at org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:38) [hbase-client-1.1.2.jar:1.1.2]
    at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:124) [hbase-client-1.1.2.jar:1.1.2]
    at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3917) [hbase-client-1.1.2.jar:1.1.2]
    at org.apache.hadoop.hbase.client.HBaseAdmin.listTableNames(HBaseAdmin.java:413) [hbase-client-1.1.2.jar:1.1.2]
    at org.apache.hadoop.hbase.client.HBaseAdmin.listTableNames(HBaseAdmin.java:397) [hbase-client-1.1.2.jar:1.1.2]
    at org.apache.nifi.hbase.HBase_1_1_2_ClientService.onEnabled(HBase_1_1_2_ClientService.java:181) [nifi-hbase_1_1_2-client-service-0.6.1.jar:0.6.1]
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_77]
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_77]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_77]
    at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_77]
    at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:137) [nifi-framework-core-0.6.1.jar:0.6.1]
    at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:125) [nifi-framework-core-0.6.1.jar:0.6.1]
    at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:70) [nifi-framework-core-0.6.1.jar:0.6.1]
    at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotation(ReflectionUtils.java:47) [nifi-framework-core-0.6.1.jar:0.6.1]
    at org.apache.nifi.controller.service.StandardControllerServiceNode$1.run(StandardControllerServiceNode.java:285) [nifi-framework-core-0.6.1.jar:0.6.1]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_77]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_77]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_77]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_77]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_77]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_77]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]

This looks like a DNS issue. The hostname "ip-10-40-20-197.ec2.internal" is not routable from the node which you are running NiFi on. Since this is on EC2, you likely need to configure HBase to use the external hostnames, not the internal names, if you intend to communicate with it from outside those hosts. I don't know what fancy things you can do with AWS to make this possible otherwise (maybe you can set up some private network between your EMR cluster and your EC2 nodes?).

This document on how to set up multi-homing for Hadoop and HBase might be helpful for you (as that's essentially the environment that EC2 sets up for you by default): https://community.hortonworks.com/articles/24277/parameters-for-multi-homing.html

avatar
Master Mentor

You could also modify the local /etc/hosts file on your ec2 instances so that the hostname "ip-10-40-197.ec2.internal" resolves to the proper external IP addresses for those zk nodes if they have them.

avatar
Master Guru

@Michael Sobelman That DNS is not detectable by the node you are trying to access from. You can be fancy on aws and configure through routing tables by setting up a proper vpn between the EMR and NiFi nodes. Another option I used is route53 which will give you DNS publicly available. Lastly you can put a ELB infront of your EMR HBase master node. You may have to script it up (via boot scripts) to configure your ELB to point to new internal IP.