Reply
New Contributor
Posts: 5
Registered: ‎05-01-2015
Accepted Solution

Impala ODBC drivers in base image?

Hello!

 

We are still getting familiar with CDSW. One thing I'm wondering is if someone knows any reasons why the Cloudera ODBC drivers are not immediately included in the base image?

 

We currently run our data science jobs on a linux edge node. Although Spark is useful, we still do a lot of data preparation in both R and Python with Impala (using ODBC - respectively the odbc and turbodbc packages).

 

I was hoping that the Impala ODBC driver would have been included in the base image. It does not look like that is the case. Unfortunately I also found out that you cannot install OS packages directly (no root access). Only option is to change/improve the base image and build a custom image.

 

Customized images is certainly useful, but it requires admin intervention. It feels a bit strange that is needed to deploy Cloudera software.

 

Similarly, the documentation states that it "currently" does not support customization of system packages that require root access. I am wondering if there is already a roadmap here and how allowing data scientists to install OS packages would work.

 

Thanks!

New Contributor
Posts: 4
Registered: ‎02-15-2018

Re: Impala ODBC drivers in base image?

[ Edited ]

I have the same issue here. Our enterprise data warehouse is in Teradata and the fastest way to communicate with that for pushing data out of simulations is to use ODBC drivers. However, I was surprised to find out that this isn't supported -- no root level access and an odbc driver manager is not pre-installed in the base image. If either iODBC or unixODBC come pre-installed this would be so helpful 

Cloudera Employee
Posts: 40
Registered: ‎04-28-2017

Re: Impala ODBC drivers in base image?

The current best approach to install system packages is to use the custom engine functionality.  This does require administration access however.

 

I have also filed a feature request internally to package the Impala ODBC drivers and ensure that the base ODBC drivers are part of the standard base image.

 

Thanks,

Tristan

New Contributor
Posts: 4
Registered: ‎02-15-2018

Re: Impala ODBC drivers in base image?

I'm not sure whether root access is necessary for the driver side, I am supposing it is. But I know that at least for the driver manager it is absolutely necessary. 

 

I am not even sure who would have administration privileges, is there a way to file a ticket? And get support on this? 

Highlighted
Cloudera Employee
Posts: 40
Registered: ‎04-28-2017

Re: Impala ODBC drivers in base image?

The base image does include the following system packages:

unixodbc-bin
libsqliteodbc
odbc-postgresql
tdsodbc

To customize the base image further, you'd need to contact your CDSW
administrator and request a package or configuration be added to the
engine. JDBC is often easier to configure without root access, if that is
an option for your use case.

Tristan
New Contributor
Posts: 5
Registered: ‎05-01-2015

Re: Impala ODBC drivers in base image?

Tristan, thanks for adding it on your list.

 

JDBC is indeed something that users will be able to do without admin intervention. That should help us out in the meantime.

Within the R/tidyverse, the odbc package is becoming popular and although it still has few problems we are pushing users to use that approach.

 

Within the context of making Impala available to CDSW users, would it be possible to preconfigure the container impala-shell with the cluster info? It's a small thing and we can specify all the info when using it, but explaining all that to users takes focus and time away on getting things done.

In our case we will also use impala-shell in a small internal package to allow transfer of data from R back to an Impala table with a single function.

 

Thanks!

Bruno

Announcements