- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Reach MS SQL via ODBC in Python within CDSW
Created on ‎07-11-2018 02:31 PM - edited ‎09-16-2022 06:27 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, we see some samples related to using PYODBC in Python within CDSW at this at bottom (https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_import_data.html)
I tried this and it failed even after I corrected the URL pointing to the library on google. I am not knowledgeable enough to debug further.
And we have received some guidance here related to unixODBC.
(https://github.com/mkleehammer/pyodbc/wiki/Connecting-to-SQL-Server-from-RHEL-or-Centos)
We are told we must first install this into the CDSW edge node environment. We could do that, but have not yet, because I am confused by existence of two options.
Has anybody succeeded in using either of these ODBC drivers for reach back into an MS SQL database instance to pull data directly via SQL queries into CDSW? If so, which did you use, and would you be willing to share your verbose steps/experience? Thank you!
Created ‎12-03-2019 06:20 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The solution is to create your own custom CDSW engine.
# Built on CDSW v8
Starting with v8 with CDSW 1.6
```
$cat /etc/issue
Ubuntu 16.04.6 LTS \n \l
```
## Built from [mssql-docker](https://github.com/microsoft/mssql-docker/blob/master/oss-drivers/pyodbc/Dockerfile)
Dockerfile:
FROM docker.repository.cloudera.com/cdsw/engine:8
MAINTAINER IQVIA
# apt-get and system utilities
RUN apt-get update && apt-get install -y \
curl apt-utils apt-transport-https debconf-utils gcc build-essential g++-5\
&& rm -rf /var/lib/apt/lists/*
# adding custom MS repository
RUN curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
RUN curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
# install SQL Server drivers
RUN apt-get update && ACCEPT_EULA=Y apt-get install -y msodbcsql unixodbc-dev
# install SQL Server tools
RUN apt-get update && ACCEPT_EULA=Y apt-get install -y mssql-tools
RUN echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bashrc
RUN /bin/bash -c "source ~/.bashrc"
# upgrade pip
RUN pip install --upgrade pip
# install SQL Server Python SQL Server connector module - pyodbc
RUN pip install pyodbc
