Support Questions
Find answers, ask questions, and share your expertise

R client connections fails with CDP Impala

Explorer

I am having trouble with the Impala connection from R with the following error:

 

'Error: nanodbc/nanodbc.cpp:983: 00000: [unixODBC][Cloudera][DriverSupport] (1100) SSL certificate verification failed because the certificate is missing or incorrect.'

 

Here is the connection details in our code:

 

impala = src_impala(

     drv = drv,

     driver = "/opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so",

     host = "cdp-tdh-de3-master0.cdp-tdh.u5te-1stu.cloudera.site",

     database = db,

     port = 21050,

     uid = username,

     pwd = password,

     AuthMech = 3,

     transportMode="http",

     httpPath="cdp-tdh-de3/cdp-proxy-api/impala",

     ssl = 1,

    sslTrustStore="/home/csso_innovation.cdh/gateway-client-trust.jks"

)

return(impala)

 

Any help is appreciated.

Thanks,

Gozde

11 REPLIES 11

Expert Contributor

Hello Gozde @gfragkos ,

Have you checked whether the connectivity works with the given sslTrustStore file with a Java based client? (for example with beeline)

As I see your application tries to use unixODBC to connect to a CDP / Impala service. However from the shared connection details I see that the truststore is a Java keystore file (JKS), and since the "nanodbc.cpp" is not a Java based application, it probably cannot recognize that as a valid truststore file. Please try to use a "pem" format trustrstore file instead.

 

Please also review the Impala ODBC Driver documentation:

https://downloads.cloudera.com/connectors/impala_odbc_2.6.14.1016/Cloudera-ODBC-Connector-for-Impala...

 

Thanks

 Miklos

Explorer

Hi Miklos,

 

Thanks a lot for your message.

I edited the code as follows and tried again but it gives the same error. Do you think there is something missing or wrong in the code below?

 

drv = odbc::odbc()

impala = src_impala(
drv = drv,
driver = "/opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so",
host = "cdp-tdh-de3-master0.cdp-tdh.u5te-1stu.cloudera.site",
database = db,
port = 21050,
uid = username,
pwd = password,
KrbRealm = "",
KrbFQDN = "",
KrbServiceName = "impala",
AuthMech = 3,
ssl = 1,
sslTrustStore="/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_cacerts.pem"
)
return(impala)

 

Thanks a lot,

Gozde

 

Expert Contributor

Hi @gfragkos, thanks for checking. Let's step back then. Is the Impala service TLS/SSL enabled at all? Can you verify that with openssl tools, like:

echo | openssl s_client -connect cdp-tdh-de3-master0.cdp-tdh.u5te-1stu.cloudera.site:21050 -CAfile /var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_cacerts.pem

 

Explorer

Hi @mszurap,

Thanks again for your message. While not understanding it very much, I think it is enabled. I see an output like the following:

CONNECTED(00000003)
depth=2 O = CDP-TDH.U5TE-1STU.CLOUDERA.SITE, CN = Certificate Authority
verify return:1
depth=1 CN = cdp-tdh-de3-master0
verify return:1
depth=0 C = US, ST = CA, CN = cdp-tdh-de3-master0.cdp-tdh.u5te-1stu.cloudera.site
verify return:1
---
Certificate chain
0 s:/C=US/ST=CA/CN=cdp-tdh-de3-master0.cdp-tdh.u5te-1stu.cloudera.site
i:/CN=cdp-tdh-de3-master0
1 s:/CN=cdp-tdh-de3-master0
i:/O=CDP-TDH.U5TE-1STU.CLOUDERA.SITE/CN=Certificate Authority
2 s:/O=CDP-TDH.U5TE-1STU.CLOUDERA.SITE/CN=Certificate Authority
i:/O=CDP-TDH.U5TE-1STU.CLOUDERA.SITE/CN=Certificate Authority
---
Server certificate

----begin certificate

....

----end certificate

No client certificate CA names sent
---
SSL handshake has read 3743 bytes and written 735 bytes
---
New, TLSv1/SSLv3, Cipher is AES256-GCM-SHA384
Server public key is 3072 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
Protocol : TLSv1.2
Cipher : ....
Session-ID: ....
Session-ID-ctx:
Master-Key: ....
Key-Arg : None
Krb5 Principal: None
PSK identity: None
PSK identity hint: None
TLS session ticket lifetime hint: 300 (seconds)
TLS session ticket:
0000 - bf 32 96 28 05 ee 1f 7a-d5 f9 54 27 ab 86 9d 0e .2.(...z..T'....
0010 - b0 0b f9 43 99 1d f6 8d-9b 0e ec 26 25 95 8d 5c ...C.......&%..\
0020 - 39 ed 4b 81 59 28 6a 11-59 71 b0 c4 41 55 fa 22 9.K.Y(j.Yq..AU."
0030 - 3c 3f e7 33 ae e6 a0 79-d7 64 a5 20 10 bd 45 da <?.3...y.d. ..E.
0040 - ef af 1d 8a 4a af 86 a7-04 61 55 d0 6b f8 1d c4 ....J....aU.k...
0050 - a0 d5 8d c1 b2 94 ad b0-b9 b8 c7 29 a3 43 e7 a4 ...........).C..
0060 - 7f 71 60 05 27 3f 3b b3-50 74 b6 57 54 b5 0a ab .q`.'?;.Pt.WT...
0070 - 5e 3f f5 d3 62 91 88 35-f6 c9 a4 3a 51 b5 f9 12 ^?..b..5...:Q...
0080 - 37 99 49 aa 06 80 c0 9f-23 67 93 fd 6b 45 8f 74 7.I.....#g..kE.t
0090 - 3a 2d f5 e1 0a e3 ea 41-f2 8a 48 ec ac 21 c8 84 :-.....A..H..!..

Start Time: 1651561803
Timeout : 300 (sec)
Verify return code: 0 (ok)
---
DONE

 

Explorer

Hi @mszurap ,

Thanks for your message,

Using the code you sent, from the output, I see that it is enabled. 

I also checked from Cloudera Manager, Impala service page that TLS/SSL for Impala is enabled and the pem files are also located in the correct folder.

 

Expert Contributor

Thanks for checking. Is the connection successful using other clients, like impala-shell, beeline and other JDBC clients?

Explorer

Hi @mszurap,

 

Thanks for still following 🙂 By using the following command, I successfully connect and see the tables:

 

impala-shell --ssl --protocol='hs2-http' -i cdp-tdh-de3-master0.cdp-tdh.u5te-1stu.cloudera.site:443 --http_path="cdp-tdh-de3/cdp-proxy-api/impala" -u csso_innovation.cdh -l

 

Or from CML by using the following python code I can connect and see the tables:

import ibis

hdfs = ibis.hdfs_connect(host='cdp-tdh-de3-master0.cdp-tdh.u5te-1stu.cloudera.site',
port=9871,
auth_mechanism='GSSAPI',
use_https = True,
verify = False)

con = ibis.impala.connect(host='cdp-tdh-de3-master0.cdp-tdh.u5te-1stu.cloudera.site',
hdfs_client=hdfs,
database='default',
kerberos_service_name='impala',
auth_mechanism='GSSAPI',
use_ssl=True,
ca_cert="/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_cacerts.pem")

con.list_tables(database='db')

I think there is something wrong in the R code I sent in my initial message, but I can not figure out what 🙂

Explorer

Actually, the python code I sent above is failing when I use the pem file with the error below:

TTransportException: failed to initialize SSL

If I remove the ca_cert field, it runs and I see the tables.

 

Explorer

I noticed that I used wrong variable name 'sslTrustStore' for the pem file. I replaced it with 'TrustedCerts' and no more certificate verification failure. But now, I have the following error:
 

Error in odbc_connect(connection_string, timezone = timezone, timezone_out = timezone_out,  :
  ignoring SIGPIPE signal
Calls: impala_connect ... dbConnect -> .local -> OdbcConnection -> odbc_connect
Execution halted 

 

Explorer

@mszurap, not sure are you still following the issue, but I just wanted to update the current situation here in case you have other suggestions 🙂

 

Following the ODBC documentation from Cloudera (https://docs.cloudera.com/documentation/other/connectors/impala-odbc/2-6-14/Cloudera-ODBC-Connector-...), after modifying the .odbc.ini file in home folder with the following:

 

[ODBC Data Sources]

Sample DSN=Cloudera ODBC Driver for Impala 64-bit

[Sample DSN]

 

driver = /opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so

host = host

database = db

port = 21050

KrbRealm = realm

KrbFQDN = fqdn

KrbServiceName = impala

AuthMech = 1

ssl = 1

TrustedCerts = /var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_cacerts.pem

 

My connection works and I can see all our tables.

However using the same connection string from our R code, I got the following error when it comes to write a table in impala:

 

Warning:
object '.__C__impala_connection' not found
Error: org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'commcare_typed' not found

 the code also has spark connection:

 

spark = spark_connect(master="yarn") 

 

and writes the table in the R code with spark_write_table fuction.

 

Any ideas about why our connection seem to be not working? 

 

 

Expert Contributor

Hi! Sorry, but this seems some R specific usage problem in which I cannot help.

What you can do is to enable DEBUG/TRACE level logging on the ODBC driver side (please check the ODBC Driver documentation how to do it), maybe there you can find further clues.

; ;