Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

pyHive interface does not seem to work for more difficult queries

avatar

I followed the article on

https://community.hortonworks.com/articles/97062/query-hive-using-python.html

BUT...this only seems to work for small queries "show databases", "show tables", "SELECT * FROM X",....when I tried a JOIN like...

'SELECT * from spin1.patient JOIN cohortdefinitions.cohort_first25_spin1db ON patient.id=cohort_first25_spin1db.id'

It fails. with

>>>cursor.execute(

'SELECT * from spin1.patient JOIN cohortdefinitions.cohort_first25_spin1db ON patient.id=cohort_first25_spin1db.id')

Traceback (most recent call last):

File "<ipython-input-22-32a836d67b2f>", line 1, in <module> cursor.execute(connInfo['query'])

File "/home/ace4user/anaconda/lib/python2.7/site-packages/pyhive/hive.py", line 240, in execute _check_status(response)

File "/home/ace4user/anaconda/lib/python2.7/site-packages/pyhive/hive.py", line 362, in _check_status raise OperationalError(response)

OperationalError: TExecuteStatementResp(status=TStatus(errorCode=1, errorMessage='Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask', sqlState='08S01', infoMessages=['*org.apache.hive.service.cli.HiveSQLException:Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask:28:27', 'org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:335', 'org.apache.hive.service.cli.operation.SQLOperation:runQuery:SQLOperation.java:199', 'org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:228', 'org.apache.hive.service.cli.operation.Operation:run:Operation.java:276', 'org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:468', 'org.apache.hive.service.cli.session.HiveSessionImpl:executeStatement:HiveSessionImpl.java:444', 'sun.reflect.GeneratedMethodAccessor211:invoke::-1', 'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43', 'java.lang.reflect.Method:invoke:Method.java:498', 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78', 'org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36', 'org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63', 'java.security.AccessController:doPrivileged:AccessController.java:-2', 'javax.security.auth.Subject:doAs:Subject.java:422', 'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1724', 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59', 'com.sun.proxy.$Proxy25:executeStatement::-1', 'org.apache.hive.service.cli.CLIService:executeStatement:CLIService.java:273', 'org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:506', 'org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1317', 'org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1302', 'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56', 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286', 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1142', 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:617', 'java.lang.Thread:run:Thread.java:745'], statusCode=3), operationHandle=None)

ANY IDEAS????

2 REPLIES 2

avatar

Did something that worked for me...

from pyhive import hive
conn = hive.Connection(host='<myhost>',
			port='<myport>',
			database='spin1',
			username='<a valid user>')  # IMPORTANT**
cursor = conn.cursor()
print cursor.fetchall()

IMPORTANT** - Before I was not using a valid username, I used 'hive' because of some earlier example I found....and eventhough simple queries worked...the more complex queries failed.

Think this may have to do with write permissions...but don't know. Reason why I say this is that When I tried the same thing using beeline it threw a similar but different error that included something about write permissions being denied. when I switched to a valid username, the query worked in beeline.

Probably a stupid mistake on my part...but wanted to write it up in case someone else repeats it.

avatar
Master Guru

Thanks for the write up. It is probably permissions required for writing temp space and Hive datawarehouse HDFS structure. Always good to use permissions. If you had kerberos or required logins or ran with a different user you may face issues as well.