About Tim Armstrong

Tim Armstrong · ‎09-19-2018

Hi @Svyat, I tested with Impala JDBC 2.6.4 and it appears to be resolved. I now get this output from my test program with the two different version of the query: Running query 1: select regexp_extract(text, '\w', 0), regexp_extract(text, '\\w', 0), text from test.a; col0= (null=false) col1=a (null=false) col2=a (null=false) Running query 2: select regexp_extract(text, '\w', 0), regexp_extract(text, '\\w', 0), text, text = 'a' from test.a; col0= (null=false) col1=a (null=false) col2=a (null=false) Test code is: import java.sql.*; public class JDBCRegex { // JDBC driver name and database URL static final String JDBC_DRIVER = "com.cloudera.impala.jdbc41.Driver"; static final String DB_URL = "jdbc:impala://localhost:21050/"; public static void main(String[] args) { Connection conn = null; Statement stmt = null; try{ Class.forName(JDBC_DRIVER); System.out.println("Connecting to a selected database..."); conn = DriverManager.getConnection(DB_URL, "", ""); System.out.println("Connected database successfully..."); System.out.println("Creating statement..."); stmt = conn.createStatement(); String sql = "select regexp_extract(text, '\\w', 0), regexp_extract(text, '\\\\w', 0), text from test.a;"; ResultSet rs = stmt.executeQuery(sql); System.out.println("Running query 1: " + sql); while(rs.next()) { System.out.println("col0=" + rs.getString(1) + " (null=" + rs.wasNull() + ") " + "col1=" + rs.getString(2) + " (null=" + rs.wasNull() + ") " + "col2=" + rs.getString(3) + " (null=" + rs.wasNull() + ") "); } rs.close(); // Add an unrelated comparison expression. sql = "select regexp_extract(text, '\\w', 0), regexp_extract(text, '\\\\w', 0), text, text = 'a' from test.a;"; System.out.println("Running query 2: " + sql); rs = stmt.executeQuery(sql); while(rs.next()) { System.out.println("col0=" + rs.getString(1) + " (null=" + rs.wasNull() + ") " + "col1=" + rs.getString(2) + " (null=" + rs.wasNull() + ") " + "col2=" + rs.getString(3) + " (null=" + rs.wasNull() + ") "); } rs.close(); }catch(SQLException se){ //Handle errors for JDBC se.printStackTrace(); }catch(Exception e){ //Handle errors for Class.forName e.printStackTrace(); }finally{ //finally block used to close resources try{ if(stmt!=null) conn.close(); }catch(SQLException se){ }// do nothing try{ if(conn!=null) conn.close(); }catch(SQLException se){ se.printStackTrace(); } } } } I ran from the command line with: javac JDBCRegex.java && CLASSPATH=~/ClouderaImpalaJDBC-2.6.4.1005/ImpalaJDBC41.jar:. time java JDBCRegex

Tim Armstrong · ‎09-18-2018

Thanks for letting us know about this, I'll see what I can do to get it fixed in a future release.

Tim Armstrong · ‎09-11-2018

We don't support this in Impala right now. We'd generally recommend using Hive to prepare such data for querying.

Tim Armstrong · ‎09-10-2018

@BorjaRodriguez that looks like a different known issue that we've seen with incremental stats on tables with large numbers of columns. It's fixed in CDH5.13+.

Tim Armstrong · ‎09-06-2018

Untracked memory is really anything that isn't explicitly tracked by query execution. We track all of the large amounts of memory used by query execution - buffers for reading from disk, the actual row data, hash tables in joins, etc, etc. The untracked memory should be small relative to that - it's mostly overhead for control structures like the runtime profile and things like that. That's usually small but it could add up easily when there are lots of queries being left open. It looks like something is unhealthy there. For one, there are a lot of queries that are still hanging around. I'd guess that there's probably a client that is misbehaving and not closing queries once it is finished with them. Those queries look like they were probably cancelled (or had all the results fetched) but were not closed by the client. One workaround for problems like that is to set an idle session timeout to periodically clear out user sessions that are not active: https://www.cloudera.com/documentation/enterprise/latest/topics/impala_timeouts.html

Tim Armstrong · ‎09-06-2018

@Tomas79Impala should definitely be able to stay under a query mem_limit by spilling to disk! I've put a lot of work into making that work more reliably over the last couple of years. I don't know what version you were experimenting with but it's gotten much better over time. We had incremental improvements in most releases from CDH5.5 -> CDH5.12, then a big improvement in CDH5.13 (there was a big rework of the spill-to-disk code), then I expect we'll see another big increase in reliability in CDH6.1 based on the work we've already done. My rules of thumb for what to expect are: CDH5.9+: spilling is fairly reliable if the query has plenty of memory (300MB+ for each spilling agg and join, adequate memory for other query operators). CDH5.13+: spilling of aggregation, hash join and sort (i.e. the most memory-hungry operators) are more reliable because those operators reserve memory for spilling and have lower minimum memory requirements. Other operators like SCAN need to have adequate memory to run in-memory. Memory limit exceeded should be rare if each query has a mem_limit of multiple GBs, e.g. 2GB+. CDH6.1+: it should be much harder to hit a "memory limit exceeded" after a query starts running - we've fixed a lot of edge cases, particularly in the SCAN operators. The most common reason I see for queries failing to spill is that there are concurrent queries running without mem_limits set or the mem_limits add up to more than the process mem_limit. This has gotten better (and 6.1 should have further improvements) but we do rely on memory-based admission control being configured with mem_limits set to completely avoiding OOM scenarios from multiple queries competing for the same memory.

Tim Armstrong · ‎08-23-2018

We have a CDH Maven repository with the CDH version of things. CDH releases don't always align exactly with Apache releases - they may include additional bug fixes, features or may have non-production-ready features disabled. I believe these are the docs for how to reference them via maven: https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh5_maven_repo.html E.g. here is hadoop-core for 5.11.2: https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/hadoop/hadoop-core/2.6.0-mr1-cdh5.11.2/

Tim Armstrong · ‎08-21-2018

I'd recommend using the versions matching your CDH installation.

Tim Armstrong · ‎08-21-2018

Hi @Yurii thanks for the bug report - we'll look into it. What version of Impala did you see this on? One thing worth trying is changing the PARQUET_ARRAY_RESOLUTION query option to THREE_LEVEL https://www.cloudera.com/documentation/enterprise/latest/topics/impala_parquet_array_resolution.html We have sometimes seen similar symptoms with parquet nested types because of some ambiguity in encodings, e.g. https://issues.apache.org/jira/browse/IMPALA-4725. The original version of Impala's nested types defaulted to detecting an older representation of arrays first and we had to keep that behaviour in Impala 2.x/CDH5.x for backwards compatibility. In Impala 3.0/CDH6.0 we're defaulting to the standard array resolution method.

Tim Armstrong · ‎08-03-2018

We looked at adding this a while back but ran into some technical issues around whether it should discover new partitions that would match the predicates: https://issues.apache.org/jira/browse/IMPALA-4105 We have a few people actively working on REFRESH/INVALIDATE and related issues, hoping to drastically improve the experience there.

Online	Offline
Last Visited	‎02-11-2021 06:07 PM

Member Since	‎07-29-2015 04:07 PM
Last Visited	‎02-11-2021 06:07 PM
Posts	535
Kudos received	141

Cloudera Community

Re: Impala Queries which were previously working a...

Re: Impala queries are not distributing to all the...

Re: impala - `recover partitions` points to old da...

Re: impala catalog server JVM

Re: Impala - On-demand metadata

Re: JDBC/ODBC: regexp_* functions' behavior depend...

Re: JDBC/ODBC: regexp_* functions' behavior depend...

Re: lateral view explode in impala?

Re: Impala Memory limit exceeded despite setting m...

Re: Clarification on "Failed to get minimum memory...

Re: Impala Memory limit exceeded despite setting m...

Re: hadoop-core and hive-exec versions for UDF

Re: hadoop-core and hive-exec versions for UDF

Re: Impala bug with nested arrays of structures wh...

Re: Refreshing multiple partitions in single quer...