I already have a working connection through ODBC using Cloudera ODBC Driver for Apache Hive, where I have my DSN set and all I need is to call pyodbc.connect(f"DSN={mydsn}", autocommit=True)
I'd like to use SQLAlchemy, but I'm struggling how to create a working connection url for multiple hosts.
I was trying to base my idea on
this guide where I found connection string
jdbc:hive2://datanode1:2181,master1:2181,master2:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2However, this
from sqlalchemy import create_engine
query = """SELECT TOP 10 * from eb.mobile_sa"""
conn_url = f'hive://{UID}@host1:2181,host2:2181,host3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2'
engine = create_engine(conn_url)
with engine.connect() as conn:
df = pd.read_sql(query, conn)
throws an error
invalid literal for int() with base 10: '2181,host2:2181,host3:2181 etc.
I've also considered cloudera's impyla since Pyhive seems to be currently unsupported (as per its github page).
I'm also unable to connect to a single host
def conn():
return connect(host=host1,
port=10000,
timeout=20,
user=user)
engine = sqlalchemy.create_engine('impala://', creator=conn)
with engine.connect() as s:
query = """SELECT TOP 10 * from eb.mobile_sa""""
df = pd.read_sql(query,s)
throws an error
thrift.transport.TTransport.TTransportException: Could not connect to any of [('10.151.50.42', 10000)]
and that's okay, since I'm not using the default port 10000. However, using port 2181 throws this error:
sqlalchemy.exc.DBAPIError: (impala.error.HiveServer2Error) Failed after retrying 3 times
[SQL: SELECT TOP 10 * from eb.mobile_sa]
Could you please advise?