About siddharth_peesa

pudnik26354 · ‎03-30-2020

The DSN-less connection string below FINALLY worked for me, in windows 10. I created a file DSN, then copy/pasted the string into the python code, as a template. Three lessons that I learned from this struggle: 1) kerberos is CASE SENSITIVE. Your kerberos realm in the string MUST be uppercase. 2) The Cloudera driver doesn't like spaces in between the semicolons in the string. Avoid them. 3) If you don't need connection pooling, turn it off with a pyodbc.pooling = False statement. import pyodbc strFileDSNAsAstring = "DRIVER=Cloudera ODBC Driver for Apache Hive;USEUNICODESQLCHARACTERTYPES=1; \ SSL=0;SERVICEPRINCIPALCANONICALIZATION=0;SERVICEDISCOVERYMODE=0;SCHEMA=database;PORT=port; \ KRBSERVICENAME=hive;KRBREALM=uppercaserealm;KRBHOSTFQDN=hostfqdndomain;INVALIDSESSIONAUTORECOVER=1; \ HOST=host;HIVESERVERTYPE=2;GETTABLESWITHQUERY=0;ENABLETEMPTABLE=0;DESCRIPTION=Hive; \ DELEGATEKRBCREDS=0;AUTHMECH=1;ASYNCEXECPOLLINTERVAL=100;APPLYSSPWITHQUERIES=1;CAIssuedCertNamesMismatch=1;" try: pyodbc.pooling = False conn = pyodbc.connect(strFileDSNAsAstring, autocommit=True) except: print("failure.") else: conn.close() print("success.")

tshiels · ‎04-25-2017

Not sure if this is the problem, but how many executors are working on the insert (when viewing your job via the Spark UI)? Are you setting executor-cores?

Online	Offline
Last Visited	‎04-02-2017 01:55 AM

Member Since	‎11-06-2016 07:08 PM
Last Visited	‎04-02-2017 01:55 AM
Posts	5
Kudos received	2

Cloudera Community

Re: Python connect to Hadoop using Hive with Kerbe...

Re: spark: Insert overwrite is very very slow in s...