Member since
12-14-2022
1
Post
0
Kudos Received
0
Solutions
12-14-2022
03:10 AM
I already have a working connection through ODBC using Cloudera ODBC Driver for Apache Hive, where I have my DSN set and all I need is to call pyodbc.connect(f"DSN={mydsn}", autocommit=True) I'd like to use SQLAlchemy, but I'm struggling how to create a working connection url for multiple hosts. I was trying to base my idea on this guide where I found connection string jdbc:hive2://datanode1:2181,master1:2181,master2:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 However, this from sqlalchemy import create_engine
query = """SELECT TOP 10 * from eb.mobile_sa"""
conn_url = f'hive://{UID}@host1:2181,host2:2181,host3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2'
engine = create_engine(conn_url)
with engine.connect() as conn:
df = pd.read_sql(query, conn) throws an error invalid literal for int() with base 10: '2181,host2:2181,host3:2181 etc. I've also considered cloudera's impyla since Pyhive seems to be currently unsupported (as per its github page). I'm also unable to connect to a single host def conn(): return connect(host=host1, port=10000, timeout=20, user=user) engine = sqlalchemy.create_engine('impala://', creator=conn) with engine.connect() as s: query = """SELECT TOP 10 * from eb.mobile_sa"""" df = pd.read_sql(query,s) throws an error thrift.transport.TTransport.TTransportException: Could not connect to any of [('10.151.50.42', 10000)] and that's okay, since I'm not using the default port 10000. However, using port 2181 throws this error: sqlalchemy.exc.DBAPIError: (impala.error.HiveServer2Error) Failed after retrying 3 times [SQL: SELECT TOP 10 * from eb.mobile_sa] Could you please advise?
... View more
Labels:
- Labels:
-
Apache Hive