Created 05-12-2023 03:21 AM
I've been testing the phoenixdb library in Python with CDP Operational DataStore cluster. Following the example here:
I can make a connection and return a single row but the connection is very slow to establish and execute. I'm running phoenixdb from the python interpreter on the COD cluster gateway host. The timestamped python execution is below.
Python 3.6.8 (default, Nov 16 2020, 16:55:22)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
>>> import phoenixdb
>>> from datetime import datetime
>>> opts = {}
>>> opts["authentication"] = "BASIC"
>>> opts["serialization"] = "PROTOBUF"
>>> opts["avatica_user"] = "workload-username"
>>> opts["avatica_password"] = "workload-password"
>>> jbdc_connection_url = "https://cod--<clusterID>-gateway0.<envID>.<custID>.cloudera.site/cod--<clusterID>/cdp-proxy-api/avatica/"
>>> print(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
2023-05-12 09:54:45
>>> conn = phoenixdb.connect(jbdc_connection_url, autocommit=True, **opts)
>>> print(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
2023-05-12 09:54:51
>>> sql_command = "SELECT * FROM DBNAME.TABLENAME WHERE CUSTID = ?"
>>> data = ("xxxxxxxx", )
>>> print(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
2023-05-12 09:54:51
>>> cursor = conn.cursor()
>>> print(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
2023-05-12 09:54:51
>>> cursor.execute(sql_command, data)
>>> print(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
2023-05-12 09:55:00
>>> results_list = cursor.fetchall()
>>> print(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
2023-05-12 09:55:00
>>> conn.close()
>>> print(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
2023-05-12 09:55:06
>>> results_list
[['xxxxxxxx', '...', '...', ...]]
You can see that it takes 6 seconds to establish the connection to Phoenix. It takes 9 second to execute the SQL command and another 6 seconds to close the connection.
Any thoughts on why this might be the case and if I can make changes to address this? Thanks.
Created 05-17-2023 08:23 AM
Hi @stephen_obrien As discussed via Cloudera case, there is a performance bottleneck when connecting via knox ( tracked in internal jira ) than directly from phoenix-sqlline from the edge node.
You can test the same when the runtime version 7.2.17 is released.
Created 05-17-2023 08:23 AM
Hi @stephen_obrien As discussed via Cloudera case, there is a performance bottleneck when connecting via knox ( tracked in internal jira ) than directly from phoenix-sqlline from the edge node.
You can test the same when the runtime version 7.2.17 is released.