<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Cannot connect to VPN host from Spark Job in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-connect-to-VPN-host-from-Spark-Job/m-p/170985#M41923</link>
    <description>&lt;P&gt;I have a fresh small HDP 2.5.0.0 cluster with spark2 (tech preview) set up. It resides on 2 virtual machines, where one is a master and the other one is a worker/slave.&lt;/P&gt;&lt;P&gt;Using 'spark-submit', I can deploy a standalone python spark application that runs and finishes just normally. When I try to connect to a pgSQL database host running on a third VM by using JDBC, spark is unable to resolve the host.&lt;/P&gt;&lt;P&gt;My python code to access the pgSQL database from within the (correctly set up) spark context looks like this:&lt;/P&gt;&lt;PRE&gt;probe = spark.read.format('jdbc').options(
    url='jdbc:postgresql://10.255.1.2:5432/gis?user=&amp;lt;theuser&amp;gt;&amp;amp;password=&amp;lt;thepassword&amp;gt;',
    driver='org.postgresql.Driver',
    dbtable='(SELECT * FROM my_db_function({}, {})) AS my_db_function_alias)'.format(123, 456)
).load()
&lt;/PRE&gt;&lt;P&gt;The driver is provided via --jars option for spark-submit and is correctly loaded (because otherwise its error would be raised earlier), but I cannot resolve the host from within the spark context:&lt;/P&gt;&lt;PRE&gt;  File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 153, in load
  File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 933, in __call__
  File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o68.load.
: org.postgresql.util.PSQLException: Connection attempt failed.
        at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:275)
        at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:54)
[...]
Caused by: java.net.NoRouteToHostException: No route to host
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
&lt;/PRE&gt;&lt;P&gt;I assume this is a NAT/config error on one of the virtual machines, but if I log on to either the master or the worker virtual machine, nslookup correctly resolves the DNS server and retrieves the correct private IP of my pgSQL database host:&lt;/P&gt;&lt;PRE&gt;# nslookup my.database.host

Server:         10.255.0.1
Address:        10.255.0.1#53


Name:   my.database.host
Address: 10.255.1.2
&lt;/PRE&gt;&lt;P&gt;As such, from both virtual machines that are part of the HDP installation, the target host is actually reachable. Even standalone non-cluster non-spark python scripts are able to connect to the database host if I start them on one of the cluster VMs.&lt;/P&gt;&lt;P&gt;Are there any settings in Ambari or wherever to enable this? Does HDP re-configure networking somehow?&lt;/P&gt;</description>
    <pubDate>Tue, 27 Sep 2016 17:20:51 GMT</pubDate>
    <dc:creator>jbendler</dc:creator>
    <dc:date>2016-09-27T17:20:51Z</dc:date>
    <item>
      <title>Cannot connect to VPN host from Spark Job</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-connect-to-VPN-host-from-Spark-Job/m-p/170985#M41923</link>
      <description>&lt;P&gt;I have a fresh small HDP 2.5.0.0 cluster with spark2 (tech preview) set up. It resides on 2 virtual machines, where one is a master and the other one is a worker/slave.&lt;/P&gt;&lt;P&gt;Using 'spark-submit', I can deploy a standalone python spark application that runs and finishes just normally. When I try to connect to a pgSQL database host running on a third VM by using JDBC, spark is unable to resolve the host.&lt;/P&gt;&lt;P&gt;My python code to access the pgSQL database from within the (correctly set up) spark context looks like this:&lt;/P&gt;&lt;PRE&gt;probe = spark.read.format('jdbc').options(
    url='jdbc:postgresql://10.255.1.2:5432/gis?user=&amp;lt;theuser&amp;gt;&amp;amp;password=&amp;lt;thepassword&amp;gt;',
    driver='org.postgresql.Driver',
    dbtable='(SELECT * FROM my_db_function({}, {})) AS my_db_function_alias)'.format(123, 456)
).load()
&lt;/PRE&gt;&lt;P&gt;The driver is provided via --jars option for spark-submit and is correctly loaded (because otherwise its error would be raised earlier), but I cannot resolve the host from within the spark context:&lt;/P&gt;&lt;PRE&gt;  File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 153, in load
  File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 933, in __call__
  File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o68.load.
: org.postgresql.util.PSQLException: Connection attempt failed.
        at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:275)
        at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:54)
[...]
Caused by: java.net.NoRouteToHostException: No route to host
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
&lt;/PRE&gt;&lt;P&gt;I assume this is a NAT/config error on one of the virtual machines, but if I log on to either the master or the worker virtual machine, nslookup correctly resolves the DNS server and retrieves the correct private IP of my pgSQL database host:&lt;/P&gt;&lt;PRE&gt;# nslookup my.database.host

Server:         10.255.0.1
Address:        10.255.0.1#53


Name:   my.database.host
Address: 10.255.1.2
&lt;/PRE&gt;&lt;P&gt;As such, from both virtual machines that are part of the HDP installation, the target host is actually reachable. Even standalone non-cluster non-spark python scripts are able to connect to the database host if I start them on one of the cluster VMs.&lt;/P&gt;&lt;P&gt;Are there any settings in Ambari or wherever to enable this? Does HDP re-configure networking somehow?&lt;/P&gt;</description>
      <pubDate>Tue, 27 Sep 2016 17:20:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-connect-to-VPN-host-from-Spark-Job/m-p/170985#M41923</guid>
      <dc:creator>jbendler</dc:creator>
      <dc:date>2016-09-27T17:20:51Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot connect to VPN host from Spark Job</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-connect-to-VPN-host-from-Spark-Job/m-p/170986#M41924</link>
      <description>&lt;P&gt;Alright, this is quite dumb. Sorry, my bad:&lt;/P&gt;&lt;P&gt;It was all about iptables on the third VM host -- didn't have that on mind since for HDP iptables is always off.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Sep 2016 20:57:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-connect-to-VPN-host-from-Spark-Job/m-p/170986#M41924</guid>
      <dc:creator>jbendler</dc:creator>
      <dc:date>2016-09-27T20:57:05Z</dc:date>
    </item>
  </channel>
</rss>

