Created 05-08-2017 04:39 PM
I followed the article on
https://community.hortonworks.com/articles/97062/query-hive-using-python.html
BUT...this only seems to work for small queries "show databases", "show tables", "SELECT * FROM X",....when I tried a JOIN like...
'SELECT * from spin1.patient JOIN cohortdefinitions.cohort_first25_spin1db ON patient.id=cohort_first25_spin1db.id'
It fails. with
>>>cursor.execute(
'SELECT * from spin1.patient JOIN cohortdefinitions.cohort_first25_spin1db ON patient.id=cohort_first25_spin1db.id')
Traceback (most recent call last):
File "<ipython-input-22-32a836d67b2f>", line 1, in <module> cursor.execute(connInfo['query'])
File "/home/ace4user/anaconda/lib/python2.7/site-packages/pyhive/hive.py", line 240, in execute _check_status(response)
File "/home/ace4user/anaconda/lib/python2.7/site-packages/pyhive/hive.py", line 362, in _check_status raise OperationalError(response)
OperationalError: TExecuteStatementResp(status=TStatus(errorCode=1, errorMessage='Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask', sqlState='08S01', infoMessages=['*org.apache.hive.service.cli.HiveSQLException:Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask:28:27', 'org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:335', 'org.apache.hive.service.cli.operation.SQLOperation:runQuery:SQLOperation.java:199', 'org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:228', 'org.apache.hive.service.cli.operation.Operation:run:Operation.java:276', 'org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:468', 'org.apache.hive.service.cli.session.HiveSessionImpl:executeStatement:HiveSessionImpl.java:444', 'sun.reflect.GeneratedMethodAccessor211:invoke::-1', 'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43', 'java.lang.reflect.Method:invoke:Method.java:498', 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78', 'org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36', 'org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63', 'java.security.AccessController:doPrivileged:AccessController.java:-2', 'javax.security.auth.Subject:doAs:Subject.java:422', 'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1724', 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59', 'com.sun.proxy.$Proxy25:executeStatement::-1', 'org.apache.hive.service.cli.CLIService:executeStatement:CLIService.java:273', 'org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:506', 'org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1317', 'org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1302', 'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56', 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286', 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1142', 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:617', 'java.lang.Thread:run:Thread.java:745'], statusCode=3), operationHandle=None)
ANY IDEAS????
Created 05-08-2017 05:31 PM
Did something that worked for me...
from pyhive import hive conn = hive.Connection(host='<myhost>', port='<myport>', database='spin1', username='<a valid user>') # IMPORTANT** cursor = conn.cursor() print cursor.fetchall()
IMPORTANT** - Before I was not using a valid username, I used 'hive' because of some earlier example I found....and eventhough simple queries worked...the more complex queries failed.
Think this may have to do with write permissions...but don't know. Reason why I say this is that When I tried the same thing using beeline it threw a similar but different error that included something about write permissions being denied. when I switched to a valid username, the query worked in beeline.
Probably a stupid mistake on my part...but wanted to write it up in case someone else repeats it.
Created 05-08-2017 05:39 PM
Thanks for the write up. It is probably permissions required for writing temp space and Hive datawarehouse HDFS structure. Always good to use permissions. If you had kerberos or required logins or ran with a different user you may face issues as well.