Member since
09-27-2016
2
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1840 | 09-27-2016 01:57 PM |
09-27-2016
01:57 PM
Alright, this is quite dumb. Sorry, my bad: It was all about iptables on the third VM host -- didn't have that on mind since for HDP iptables is always off.
... View more
09-27-2016
10:20 AM
I have a fresh small HDP 2.5.0.0 cluster with spark2 (tech preview) set up. It resides on 2 virtual machines, where one is a master and the other one is a worker/slave. Using 'spark-submit', I can deploy a standalone python spark application that runs and finishes just normally. When I try to connect to a pgSQL database host running on a third VM by using JDBC, spark is unable to resolve the host. My python code to access the pgSQL database from within the (correctly set up) spark context looks like this: probe = spark.read.format('jdbc').options(
url='jdbc:postgresql://10.255.1.2:5432/gis?user=<theuser>&password=<thepassword>',
driver='org.postgresql.Driver',
dbtable='(SELECT * FROM my_db_function({}, {})) AS my_db_function_alias)'.format(123, 456)
).load()
The driver is provided via --jars option for spark-submit and is correctly loaded (because otherwise its error would be raised earlier), but I cannot resolve the host from within the spark context: File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 153, in load
File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 933, in __call__
File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o68.load.
: org.postgresql.util.PSQLException: Connection attempt failed.
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:275)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:54)
[...]
Caused by: java.net.NoRouteToHostException: No route to host
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
I assume this is a NAT/config error on one of the virtual machines, but if I log on to either the master or the worker virtual machine, nslookup correctly resolves the DNS server and retrieves the correct private IP of my pgSQL database host: # nslookup my.database.host
Server: 10.255.0.1
Address: 10.255.0.1#53
Name: my.database.host
Address: 10.255.1.2
As such, from both virtual machines that are part of the HDP installation, the target host is actually reachable. Even standalone non-cluster non-spark python scripts are able to connect to the database host if I start them on one of the cluster VMs. Are there any settings in Ambari or wherever to enable this? Does HDP re-configure networking somehow?
... View more
Labels:
- Labels:
-
Apache Spark