- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Continuing my series of how-to articles for CDP, today we explore how to connect to Impala via JDBC in JSON. In my example, I will use a Jupyter notebook running in CML, but this can be generalized.
This process is actually fairly easy, so let's dive in.
Step 1: Setup Impala JDBC drivers
- First, download the latest impala JDBC drivers from Cloudera JDBC Driver 2.6.17 for Impala.
- Then, upload them to your machine. Here is an example of a CML Jupyter sessions with the jars uploaded:
- Finally, make sure that you set up your CLASSPATH properly by opening a terminal session and typing the following:
CLASSPATH=.:/home/cdsw/ImpalaJDBC4.jar:/home/cdsw/ImpalaJDBC41.jar:/home/cdsw/ImpalaJDBC42.jar export CLASSPATH
Step 2: Install JayDeBeApi
- To install JayDeBeApi, run the following:
pip3 install JayDeBeApi
- A recommended step to avoid getting an error along the lines of "AttributeError: type object 'java.sql.Types' has no attribute '__javaclass__'", would be to downgrade your jpype by running the following:
pip3 install --upgrade jpype1==0.6.3 --user
- Restart your kernel when you perform the downgrade.
Step 3: Connect to Impala
- Finally, connect to your impala, using the following sample code:
import jaydebeapi conn = jaydebeapi.connect("com.cloudera.impala.jdbc.DataSource", "jdbc:impala://[your_host]:443/;ssl=1;transportMode=http;httpPath=icml-data-mart/cdp-proxy-api/impala;AuthMech=3;", {'UID': "[your_cdp_user]", 'PWD': "[your_workload_pwd]"}, '/home/cdsw/ImpalaJDBC41.jar') curs = conn.cursor() curs.execute("select * from default.locations") curs.fetchall() curs.close() conn.close()
Note: You can get your impala JDBC string either from the Datahub endpoint path or from the JDBC URL from CDW.
The following is a screenshot of my code in action:
Created on 03-23-2021 09:59 AM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Is there any way to have CDSW connect to Impala to run straight SQL ?
I have searched my tutuorials and suggestion on the Web and have found none that work with CDSW in our environment.
Created on 03-24-2021 07:20 PM - edited 03-24-2021 07:22 PM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Hello,
Nice tutorial, this library is fast!
If anyone is running into
java.sql.SQLExceptionPyRaisable: java.sql.SQLException: [Cloudera][ImpalaJDBCDriver](500605) Error occurred while opening a session with the server. No additional detail from the server regarding this error is available. Please ensure that the driver configuration is compatible with the server configuration. This type of error can also occur when the server is too busy to handle the request. Please try again later.
I was able to fix it by changing the httpPath parameter in the impala hostname from "icml-data-mart/cdp-proxy-api/impala" to
to "cliservice" as follows:
"jdbc:impala://"+os.environ["IMPALA_HOST"]+":443/;ssl=1;transportMode=http;httpPath=cliservice;AuthMech=3;"
Hope this helps anyone!
Created on 08-19-2021 06:24 AM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Hello, I am running this from the company network and I believe we have some sort of certificate for using cloudera-impala. When I copy the URL from the impala_prod it gives me at the end also a uid(which is my ID) and a password which is a standard password(not given by me at any point in time).
So when I run this script this is the error I recieve:
java.sql.SQLException: java.sql.SQLException: [Cloudera][ImpalaJDBCDriver](500170) Error occurred while setting up ALTUS Dynamic Discovery: Unable to load credentials from provider files.
Do you have any ideas how can I fix this?