Support Questions

Find answers, ask questions, and share your expertise

Connections to CDP COD via python phoenixdb very slow

avatar
New Contributor

I've been testing the phoenixdb library in Python with CDP Operational DataStore cluster. Following the example here:

https://community.cloudera.com/t5/Community-Articles/Using-phoenixdb-to-connect-Cloudera-Machine-Lea...

I can make a connection and return a single row but the connection is very slow to establish and execute. I'm running phoenixdb from the python interpreter on the COD cluster gateway host. The timestamped python execution is below.

 

Python 3.6.8 (default, Nov 16 2020, 16:55:22)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
>>> import phoenixdb
>>> from datetime import datetime
>>> opts = {}
>>> opts["authentication"] = "BASIC"
>>> opts["serialization"] = "PROTOBUF"
>>> opts["avatica_user"] = "workload-username"
>>> opts["avatica_password"] = "workload-password"
>>> jbdc_connection_url = "https://cod--<clusterID>-gateway0.<envID>.<custID>.cloudera.site/cod--<clusterID>/cdp-proxy-api/avatica/"
>>> print(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
2023-05-12 09:54:45
>>> conn = phoenixdb.connect(jbdc_connection_url, autocommit=True, **opts)
>>> print(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
2023-05-12 09:54:51
>>> sql_command = "SELECT * FROM DBNAME.TABLENAME WHERE CUSTID = ?"
>>> data = ("xxxxxxxx", )
>>> print(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
2023-05-12 09:54:51
>>> cursor = conn.cursor()
>>> print(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
2023-05-12 09:54:51
>>> cursor.execute(sql_command, data)
>>> print(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
2023-05-12 09:55:00
>>> results_list = cursor.fetchall()
>>> print(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
2023-05-12 09:55:00
>>> conn.close()
>>> print(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
2023-05-12 09:55:06
>>> results_list
[['xxxxxxxx', '...', '...', ...]]

You can see that it takes 6 seconds to establish the connection to Phoenix. It takes 9 second to execute the SQL command and another 6 seconds to close the connection.

Any thoughts on why this might be the case and if I can make changes to address this? Thanks.

 

 

 

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hi @stephen_obrien As discussed via Cloudera case, there is a performance bottleneck when connecting via knox ( tracked in internal jira ) than directly from phoenix-sqlline from the edge node.

 

You can test the same when the runtime version 7.2.17 is released.

View solution in original post

1 REPLY 1

avatar
Super Collaborator

Hi @stephen_obrien As discussed via Cloudera case, there is a performance bottleneck when connecting via knox ( tracked in internal jira ) than directly from phoenix-sqlline from the edge node.

 

You can test the same when the runtime version 7.2.17 is released.