- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Error message from ODBC connection
- Labels:
-
Apache Impala
-
Apache Kudu
Created on ‎04-24-2019 07:25 AM - edited ‎09-16-2022 07:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I keep getting error messages while querying with ODBC connector to Impala (using pyodbc package for python):
pyodbc.OperationalError: ('08S01', '[08S01] [Cloudera][ImpalaODBC] (120) Error while retrieving data from in Impala: [08S01] : ImpalaThriftAPICallFailed (120) (SQLFetch)')
I have a smaller table (table1) and a bigger table (table2).
With the smaller table I tried with one and 10 processses and everything worked fine.
When I started 50 parallel python process, each having separate connection to Impala with pyodbc, after a few seconds I got the error message above (when calling cursor.fetchmany(1000) function).
With the bigger table, I got the error even with 1 process.
Client Setup:
Windows 10 + official Impala ODBC driver
The python program creates a process, connects with pyodbc to Impala and executes queries for 3 minutes. Then closes the cursor and the connection.
Cluster Setup:
1 master + 4 tablet server
Impala 3.1.0, Kudu 1.8.0 (CDH 6.1 with default parameters)
Data stored in Kudu table1: ~0.7 10^9 row (with 7 columns)
Data stored in Kudu table2: ~18 10^9 row (with 7 columns)
Additional notes:
I also got this error while using CentOS 7 + official Impala ODBC driver
While using LIMIT 100 on the queries, still got this error, but previously it happened earlier.
While using JDBC connector, everything worked fine for 1, 10 and 50 processes.
Created ‎05-09-2019 01:33 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Each Impala Daemon can only accept a finite total number of active client connections, which is likely what you are running into.
Typically for concurrent access to a DB, it is better to use a connection pooling pattern with finite connections shared between threads of a single application. This avoids overloading a target server.
While I haven't used it, pyodbc may support connection pooling and reuse which you can utilise via threads in python, instead of creating separate processes.
Alternatively, spread the connections around, either by introducing a load balancer, or by varying the target options for each spawned process. See https://www.cloudera.com/documentation/enterprise/latest/topics/impala_dedicated_coordinator.html and http://www.cloudera.com/documentation/other/reference-architecture/PDF/Impala-HA-with-F5-BIG-IP.pdf for further guidance and examples on this.
