<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question How to connect to multiple hosts using pyhive or impala in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/How-to-connect-to-multiple-hosts-using-pyhive-or-impala/m-p/359487#M238135</link>
    <description>&lt;DIV class="votecell post-layout--left"&gt;&lt;DIV class="js-voting-container d-flex jc-center fd-column ai-stretch gs4 fc-black-200"&gt;&lt;SPAN&gt;I already have a working connection through ODBC using Cloudera ODBC Driver for Apache Hive, where I have my DSN set and all I need is to call&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;pyodbc.connect(f"DSN={mydsn}", autocommit=True)&lt;/DIV&gt;&lt;DIV class="js-voting-container d-flex jc-center fd-column ai-stretch gs4 fc-black-200"&gt;&lt;SPAN&gt;I'd like to use SQLAlchemy, but I'm struggling how to create a working connection url for multiple hosts.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class="js-voting-container d-flex jc-center fd-column ai-stretch gs4 fc-black-200"&gt;I was trying to base my idea on&amp;nbsp;&lt;A href="http://www.hadooplessons.info/2017/12/dynami-service-discovery-in-hive-using-zookeeper.html" target="_self"&gt;this guide&lt;/A&gt;&amp;nbsp;where I found connection string &lt;STRONG&gt;jdbc:hive2://datanode1:2181,master1:2181,master2:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV class="js-voting-container d-flex jc-center fd-column ai-stretch gs4 fc-black-200"&gt;However, this&lt;/DIV&gt;&lt;DIV class="js-voting-container d-flex jc-center fd-column ai-stretch gs4 fc-black-200"&gt;&lt;PRE&gt;&lt;SPAN class="hljs-keyword"&gt;from&lt;/SPAN&gt; sqlalchemy &lt;SPAN class="hljs-keyword"&gt;import&lt;/SPAN&gt; create_engine
query = &lt;SPAN class="hljs-string"&gt;"""SELECT TOP 10 * from eb.mobile_sa"""&lt;/SPAN&gt;
conn_url = &lt;SPAN class="hljs-string"&gt;f'hive://&lt;SPAN class="hljs-subst"&gt;{UID}&lt;/SPAN&gt;@host1:2181,host2:2181,host3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2'&lt;/SPAN&gt;
engine = create_engine(conn_url)
&lt;SPAN class="hljs-keyword"&gt;with&lt;/SPAN&gt; engine.connect() &lt;SPAN class="hljs-keyword"&gt;as&lt;/SPAN&gt; conn:
    df = pd.read_sql(query, conn)&lt;/PRE&gt;&lt;P&gt;throws an error&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;invalid literal for int() with base 10: '2181,host2:2181,host3:2181&lt;SPAN&gt;&amp;nbsp;etc.&lt;/SPAN&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I've also considered&amp;nbsp;&lt;A href="https://github.com/cloudera/impyla" target="_self"&gt;cloudera's impyla&lt;/A&gt;&amp;nbsp;since Pyhive seems to be currently unsupported (as per its github page).&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I'm also unable to connect to a single host&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;&lt;SPAN&gt;def&lt;/SPAN&gt; &lt;SPAN&gt;conn&lt;/SPAN&gt;&lt;SPAN&gt;():&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;    return&lt;/SPAN&gt; &lt;SPAN&gt;connect&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;host&lt;/SPAN&gt;&lt;SPAN&gt;=host1&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;                         port&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;10000&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;timeout&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;20&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;user&lt;/SPAN&gt;&lt;SPAN&gt;=user&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;engine&lt;/SPAN&gt;&lt;SPAN&gt; = &lt;/SPAN&gt;&lt;SPAN&gt;sqlalchemy&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;create_engine&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'impala://'&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;creator&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;conn&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;with&lt;/SPAN&gt; &lt;SPAN&gt;engine&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;connect&lt;/SPAN&gt;&lt;SPAN&gt;() &lt;/SPAN&gt;&lt;SPAN&gt;as&lt;/SPAN&gt; &lt;SPAN&gt;s&lt;/SPAN&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; query = &lt;SPAN class="hljs-string"&gt;"""SELECT TOP 10 * from eb.mobile_sa"""&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;df&lt;/SPAN&gt;&lt;SPAN&gt; = &lt;/SPAN&gt;&lt;SPAN&gt;pd&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;read_sql&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;query&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;s&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/PRE&gt;&lt;P&gt;throws an error&lt;/P&gt;&lt;PRE&gt;thrift.transport.TTransport.TTransportException: Could not connect to any of [('10.151.50.42', 10000)]&lt;/PRE&gt;&lt;P&gt;and that's okay, since I'm not using the default port 10000. However, using port 2181 throws this error:&lt;/P&gt;&lt;PRE&gt;sqlalchemy.exc.DBAPIError: (impala.error.HiveServer2Error) Failed after retrying 3 times&lt;BR /&gt;[SQL: SELECT TOP 10 * from eb.mobile_sa]&lt;/PRE&gt;&lt;P&gt;Could you please advise?&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Wed, 14 Dec 2022 11:10:28 GMT</pubDate>
    <dc:creator>Kropiciel</dc:creator>
    <dc:date>2022-12-14T11:10:28Z</dc:date>
    <item>
      <title>How to connect to multiple hosts using pyhive or impala</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-connect-to-multiple-hosts-using-pyhive-or-impala/m-p/359487#M238135</link>
      <description>&lt;DIV class="votecell post-layout--left"&gt;&lt;DIV class="js-voting-container d-flex jc-center fd-column ai-stretch gs4 fc-black-200"&gt;&lt;SPAN&gt;I already have a working connection through ODBC using Cloudera ODBC Driver for Apache Hive, where I have my DSN set and all I need is to call&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;pyodbc.connect(f"DSN={mydsn}", autocommit=True)&lt;/DIV&gt;&lt;DIV class="js-voting-container d-flex jc-center fd-column ai-stretch gs4 fc-black-200"&gt;&lt;SPAN&gt;I'd like to use SQLAlchemy, but I'm struggling how to create a working connection url for multiple hosts.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class="js-voting-container d-flex jc-center fd-column ai-stretch gs4 fc-black-200"&gt;I was trying to base my idea on&amp;nbsp;&lt;A href="http://www.hadooplessons.info/2017/12/dynami-service-discovery-in-hive-using-zookeeper.html" target="_self"&gt;this guide&lt;/A&gt;&amp;nbsp;where I found connection string &lt;STRONG&gt;jdbc:hive2://datanode1:2181,master1:2181,master2:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV class="js-voting-container d-flex jc-center fd-column ai-stretch gs4 fc-black-200"&gt;However, this&lt;/DIV&gt;&lt;DIV class="js-voting-container d-flex jc-center fd-column ai-stretch gs4 fc-black-200"&gt;&lt;PRE&gt;&lt;SPAN class="hljs-keyword"&gt;from&lt;/SPAN&gt; sqlalchemy &lt;SPAN class="hljs-keyword"&gt;import&lt;/SPAN&gt; create_engine
query = &lt;SPAN class="hljs-string"&gt;"""SELECT TOP 10 * from eb.mobile_sa"""&lt;/SPAN&gt;
conn_url = &lt;SPAN class="hljs-string"&gt;f'hive://&lt;SPAN class="hljs-subst"&gt;{UID}&lt;/SPAN&gt;@host1:2181,host2:2181,host3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2'&lt;/SPAN&gt;
engine = create_engine(conn_url)
&lt;SPAN class="hljs-keyword"&gt;with&lt;/SPAN&gt; engine.connect() &lt;SPAN class="hljs-keyword"&gt;as&lt;/SPAN&gt; conn:
    df = pd.read_sql(query, conn)&lt;/PRE&gt;&lt;P&gt;throws an error&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;invalid literal for int() with base 10: '2181,host2:2181,host3:2181&lt;SPAN&gt;&amp;nbsp;etc.&lt;/SPAN&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I've also considered&amp;nbsp;&lt;A href="https://github.com/cloudera/impyla" target="_self"&gt;cloudera's impyla&lt;/A&gt;&amp;nbsp;since Pyhive seems to be currently unsupported (as per its github page).&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I'm also unable to connect to a single host&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;&lt;SPAN&gt;def&lt;/SPAN&gt; &lt;SPAN&gt;conn&lt;/SPAN&gt;&lt;SPAN&gt;():&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;    return&lt;/SPAN&gt; &lt;SPAN&gt;connect&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;host&lt;/SPAN&gt;&lt;SPAN&gt;=host1&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;                         port&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;10000&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;timeout&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;20&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;user&lt;/SPAN&gt;&lt;SPAN&gt;=user&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;engine&lt;/SPAN&gt;&lt;SPAN&gt; = &lt;/SPAN&gt;&lt;SPAN&gt;sqlalchemy&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;create_engine&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'impala://'&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;creator&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;conn&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;with&lt;/SPAN&gt; &lt;SPAN&gt;engine&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;connect&lt;/SPAN&gt;&lt;SPAN&gt;() &lt;/SPAN&gt;&lt;SPAN&gt;as&lt;/SPAN&gt; &lt;SPAN&gt;s&lt;/SPAN&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; query = &lt;SPAN class="hljs-string"&gt;"""SELECT TOP 10 * from eb.mobile_sa"""&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;df&lt;/SPAN&gt;&lt;SPAN&gt; = &lt;/SPAN&gt;&lt;SPAN&gt;pd&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;read_sql&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;query&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;s&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/PRE&gt;&lt;P&gt;throws an error&lt;/P&gt;&lt;PRE&gt;thrift.transport.TTransport.TTransportException: Could not connect to any of [('10.151.50.42', 10000)]&lt;/PRE&gt;&lt;P&gt;and that's okay, since I'm not using the default port 10000. However, using port 2181 throws this error:&lt;/P&gt;&lt;PRE&gt;sqlalchemy.exc.DBAPIError: (impala.error.HiveServer2Error) Failed after retrying 3 times&lt;BR /&gt;[SQL: SELECT TOP 10 * from eb.mobile_sa]&lt;/PRE&gt;&lt;P&gt;Could you please advise?&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 14 Dec 2022 11:10:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-connect-to-multiple-hosts-using-pyhive-or-impala/m-p/359487#M238135</guid>
      <dc:creator>Kropiciel</dc:creator>
      <dc:date>2022-12-14T11:10:28Z</dc:date>
    </item>
    <item>
      <title>Re: How to connect to multiple hosts using pyhive or impala</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-connect-to-multiple-hosts-using-pyhive-or-impala/m-p/383820#M245088</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/102412"&gt;@Kropiciel&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;To connect ODBC driver with multiple HiveServers, you can configure High Availability for Hive using LB, refer below doc for same. The use the LB hostname and port in the connection string&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.cloudera.com/cdp-private-cloud-base/7.1.9/configuring-apache-hive/topics/hive-ha-loadbalancer.html" target="_blank" rel="noopener"&gt;https://docs.cloudera.com/cdp-private-cloud-base/7.1.9/configuring-apache-hive/topics/hive-ha-loadbalancer.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Let us know if this helps.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;As for the python connection use below code and check how it goes.&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&lt;/P&gt;&lt;P class="p1"&gt;CODE:&lt;/P&gt;&lt;P class="p1"&gt;---------&lt;/P&gt;&lt;P class="p1"&gt;from sqlalchemy import create_engine&lt;/P&gt;&lt;P class="p1"&gt;#Input Information&lt;/P&gt;&lt;P class="p1"&gt;host = hive_hostname&lt;/P&gt;&lt;P class="p1"&gt;port = 10000&lt;/P&gt;&lt;P class="p1"&gt;schema = schema_name&lt;/P&gt;&lt;P class="p1"&gt;table = table_name&lt;/P&gt;&lt;P class="p2"&gt;&amp;nbsp;&lt;/P&gt;&lt;P class="p1"&gt;#Execution&lt;/P&gt;&lt;P class="p1"&gt;engine = create_engine(f'hive://{host}:{port}/{schema}')&lt;/P&gt;&lt;P class="p1"&gt;engine.execute(QUERY)&lt;/P&gt;</description>
      <pubDate>Fri, 23 Feb 2024 14:23:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-connect-to-multiple-hosts-using-pyhive-or-impala/m-p/383820#M245088</guid>
      <dc:creator>jAnshula</dc:creator>
      <dc:date>2024-02-23T14:23:16Z</dc:date>
    </item>
  </channel>
</rss>

