Created 06-01-2023 09:28 AM
Dear community,
I am writing to you because I have the following problem: How can I connect to impala tables from Cloudera Data Science Workbench (CDSW), if I am setting up a session where R 3.x or R 4.x is used. I managed to make the connection using python, but I need to do it using R as well.
I have the connection host, the port, the authentication mechanism. For example, the following python code manages to connect:
"from impala.dbapi
import connect
conn = connect(host='host_name', port=21050, auth_mechanism='GSSAPI', use_ssl=True, kerberos_service_name='impala')
cursor = conn.cursor()"
But I need to achieve the connection now using R in Cloudera Data Science Workbench.
Does anyone know how to translate the above code to R 3.x or R 4.x? I already tried using the 'RImpala' library, but it was removed from the CRAN repository according to https://cran.r-project.org/web/packages/RImpala/index.html.
If anyone has an idea how to help me I would be very grateful.
Created 06-01-2023 12:45 PM
@asandovala21 Welcome to the Cloudera Community!
To help you get the best possible solution, I have tagged our CDSW experts @Gopinath and @Mike who may be able to assist you further.
Please keep us updated on your post, and we hope you find a satisfactory solution to your query.
Regards,
Diana Torres,Created 06-01-2023 02:39 PM
Hi, the documentation suggests to use SparklyR to read from Impala in R:
https://docs.cloudera.com/cdsw/1.10.3/import-data/topics/cdsw-running-queries-on-impala-tables.html#...
I think you could also use RJDBC library to set up a JDBC connection to impala:
https://cran.r-project.org/web/packages/RJDBC/index.html
Created 06-05-2023 06:32 AM
@asandovala21 Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
Regards,
Diana Torres,