Member since
01-15-2016
3
Posts
2
Kudos Received
0
Solutions
04-18-2016
01:32 PM
This error can also occur if the Atlas service is turned off. From the exception stack trace: at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.io.IOException: java.net.ConnectException: Connection refused at org.apache.atlas.security.SecureClientUtils$1$1.run(SecureClientUtils.java:107) at org.apache.atlas.security.SecureClientUtils$1$1.run(SecureClientUtils.java:99) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.atlas.security.SecureClientUtils$1.getHttpURLConnection(SecureClientUtils.java:99) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:159) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147) ... 26 more Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method) See the link here for more information: https://community.hortonworks.com/questions/22396/ranger-dependency-on-atlas.html
... View more
02-02-2016
02:55 PM
2 Kudos
Yes you can use oozie. Let's concentrate on this because to run queries in parallel efficiently you most likely will need an oozie workflow anyway. ( Falcon can kick off oozie workflows ) Regardless if you use falcon or a classic oozie coordinator to schedule them. What oozie workflows provide is the ability to create an execution graph where each action can continue on to any other node. ( cycles are forbidden ) . To allow parallel action execution you have forks and joins. A fork starts two actions in parallel and a join waits for all actions it waits on to finish. So you can pretty much create any structure you want. The example below is very simple but you could also have fork in a fork etc. pp. There are surely other ways as well but Oozie most likely will be the canonical way of doing it. For example: <start to="check-files"/>
<fork name="parallel-load">
<path start="load1"/>
<path start="load2"/>
</fork>
<action name="load1">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<jdbc-url>jdbc:hive2://hiveserver:10000/default</jdbc-url>
<password>${hivepassword}</password>
<script>/data/sql/load1.sql</script>
</hive2>
<ok to="join-node"/>
<error to="kill"/>
</action>
<action name="load2">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<jdbc-url>jdbc:hive2://hiveserver:10000/default</jdbc-url>
<password>${hivepassword}</password>
<script>/data/sql/load2.sql</script>
</hive2>
<ok to="join-node"/>
<error to="kill"/>
</action>
<join name="join-node" to="end"/>
... View more