Support Questions

Find answers, ask questions, and share your expertise

issue trying Impyla

avatar
Explorer

I am trying the sample impyla code from 

 

http://blog.cloudera.com/blog/2014/04/a-new-python-client-for-impala/

 

And getting "impala.error.HiveServer2Error: Failed after retrying 3 times"

 

impyla is installed on the hadoop (CDH-5.3.2) node I log in to

 

 

Tried:

from impala.dbapi import connect
conn = connect(host='my.impala.host', port=21050)
cursor = conn.cursor()
cursor.execute('SELECT * FROM youval_db.accounts_info LIMIT 10')
print cursor.description # prints the result set's schema
results = cursor.fetchall()

Where for  "my.impala.host" I used the the impala host I got from the cloudera manager.

(tried with host from the following groups: Impala Catalog Server Default Group, Impala Daemon Default Group and Impala StateStore Default Group)

 

got the same error for All.

 

Also tried with 

conn = connect()

It did not work as well.

 

Any suggestion on how to make it work?

Thanks 

8 REPLIES 8

avatar
New Contributor

Been getting the same error when I was trying to connect to the impala instance on a kerberized cluster! Any particular reason why we get this??

avatar
Expert Contributor

Anyone found an answer for this I am also getting same error when I run below. This is a kerberos cluster and Impala works fine through HUE and odbc:

--------------------

from impala.dbapi import connect
conn = connect(host='myhost', port=21050)

cursor = conn.cursor()
cursor.execute('SELECT * FROM default.testtable')
print (cursor.description) # prints the result set's schema
results = cursor.fetchall()

 

 

avatar

I believe that error should be fixed with the most recent releases of Impyla (0.16.1) and thrift_sasl (0.4.2)

avatar
Expert Contributor

Thanks, you are a genius 🙂 .
Installing thrift-sasl-0.4.2 and impyla 0.16.2 did allow successful running of the script. However now I have a different issue. The call cursor.fetchmany(size=3) hangs indefinitely in Jupyter notebook. It executes immediately in similar pyhive script on same small table.

from impala.dbapi import connect
conn = connect(host='myhost', port=21050, auth_mechanism='GSSAPI', kerberos_service_name='impala')
cursor = conn.cursor()
cursor.execute('SELECT * FROM default.mytable LIMIT 100')
cursor.fetchmany(size=3)
cursor.close()
conn.close()

It show query status as Executing in Cloudera manager->Impala Queries monitor. But also says Query State: FINISHED in the query details .

The hang seems to be in the statement buff = self.sock.recv(sz)

/data/opt/anaconda3/lib/python3.7/site-packages/thriftpy2/transport/socket.py in read(self, sz)
    107         while True:
    108             try:
--> 109                 buff = self.sock.recv(sz)
    110             except socket.error as e:
    111                 if e.errno == errno.EINTR:

KeyboardInterrupt: 

 

After trying various options and setting timeout=100 in the connect statement, it appears the script queries impala table successfully but every 2nd or 3rd time it fails with the below error:

/data/opt/anaconda3/lib/python3.7/site-packages/impala/hiveserver2.py in _rpc(self, func_name, request)
    992         response = self._execute(func_name, request)
    993         self._log_response(func_name, response)
--> 994         err_if_rpc_not_ok(response)
    995         return response
    996 

/data/opt/anaconda3/lib/python3.7/site-packages/impala/hiveserver2.py in err_if_rpc_not_ok(resp)
    746             resp.status.statusCode != TStatusCode.SUCCESS_WITH_INFO_STATUS and
    747             resp.status.statusCode != TStatusCode.STILL_EXECUTING_STATUS):
--> 748         raise HiveServer2Error(resp.status.errorMessage)
    749 
    750 

HiveServer2Error: Invalid query handle: b14cce8e19xxxx:5b51463xxxx

 Any more thoughts?

avatar
New Contributor

Hi, we're experiencing the same issue as above - "Invalid query handle" error on thrift-sasl 0.4.2 with kerberos auth. Everything works fine on thrift-sasl 0.2.1.

 

Was there any resolution?

 

 

avatar
Expert Contributor

There seems to be different version of thrift-sasl and impyla that work or dont work and it is not easy to figure out these version mismatches. So we finally abandoned impyla and went with pyodbc with cloudera impala odbc driver which is easier to make it work and is working good so far. Check out this link: https://plenium.wordpress.com/2020/05/04/use-pyodbc-with-cloudera-impala-odbc-and-kerberos/

avatar
New Contributor

have you solve this problems?

avatar
New Contributor

@JasonBourne - if you have the same issue, here's a GitHub issue discussing it and linking to a pull request to fix it:
https://github.com/cloudera/thrift_sasl/issues/28

You can see in the commits (here: https://github.com/cloudera/thrift_sasl/commits/master), they are testing a new release for a fix, but it looks like it's not quite done yet. Hopefully soon.