Member since
11-06-2016
5
Posts
2
Kudos Received
0
Solutions
03-30-2020
09:54 AM
The DSN-less connection string below FINALLY worked for me, in windows 10. I created a file DSN, then copy/pasted the string into the python code, as a template. Three lessons that I learned from this struggle: 1) kerberos is CASE SENSITIVE. Your kerberos realm in the string MUST be uppercase. 2) The Cloudera driver doesn't like spaces in between the semicolons in the string. Avoid them. 3) If you don't need connection pooling, turn it off with a pyodbc.pooling = False statement. import pyodbc strFileDSNAsAstring = "DRIVER=Cloudera ODBC Driver for Apache Hive;USEUNICODESQLCHARACTERTYPES=1; \ SSL=0;SERVICEPRINCIPALCANONICALIZATION=0;SERVICEDISCOVERYMODE=0;SCHEMA=database;PORT=port; \ KRBSERVICENAME=hive;KRBREALM=uppercaserealm;KRBHOSTFQDN=hostfqdndomain;INVALIDSESSIONAUTORECOVER=1; \ HOST=host;HIVESERVERTYPE=2;GETTABLESWITHQUERY=0;ENABLETEMPTABLE=0;DESCRIPTION=Hive; \ DELEGATEKRBCREDS=0;AUTHMECH=1;ASYNCEXECPOLLINTERVAL=100;APPLYSSPWITHQUERIES=1;CAIssuedCertNamesMismatch=1;" try: pyodbc.pooling = False conn = pyodbc.connect(strFileDSNAsAstring, autocommit=True) except: print("failure.") else: conn.close() print("success.")
... View more
04-25-2017
04:24 PM
Not sure if this is the problem, but how many executors are working on the insert (when viewing your job via the Spark UI)? Are you setting executor-cores?
... View more