Member since
07-29-2015
535
Posts
140
Kudos Received
103
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4501 | 12-18-2020 01:46 PM | |
2802 | 12-16-2020 12:11 PM | |
1882 | 12-07-2020 01:47 PM | |
1473 | 12-07-2020 09:21 AM | |
967 | 10-14-2020 11:15 AM |
09-19-2018
04:05 PM
Hi @Svyat, I tested with Impala JDBC 2.6.4 and it appears to be resolved. I now get this output from my test program with the two different version of the query: Running query 1: select regexp_extract(text, '\w', 0), regexp_extract(text, '\\w', 0), text from test.a;
col0= (null=false) col1=a (null=false) col2=a (null=false)
Running query 2: select regexp_extract(text, '\w', 0), regexp_extract(text, '\\w', 0), text, text = 'a' from test.a;
col0= (null=false) col1=a (null=false) col2=a (null=false) Test code is: import java.sql.*;
public class JDBCRegex {
// JDBC driver name and database URL
static final String JDBC_DRIVER = "com.cloudera.impala.jdbc41.Driver";
static final String DB_URL = "jdbc:impala://localhost:21050/";
public static void main(String[] args) {
Connection conn = null;
Statement stmt = null;
try{
Class.forName(JDBC_DRIVER);
System.out.println("Connecting to a selected database...");
conn = DriverManager.getConnection(DB_URL, "", "");
System.out.println("Connected database successfully...");
System.out.println("Creating statement...");
stmt = conn.createStatement();
String sql = "select regexp_extract(text, '\\w', 0), regexp_extract(text, '\\\\w', 0), text from test.a;";
ResultSet rs = stmt.executeQuery(sql);
System.out.println("Running query 1: " + sql);
while(rs.next()) {
System.out.println("col0=" + rs.getString(1) + " (null=" + rs.wasNull() + ") " +
"col1=" + rs.getString(2) + " (null=" + rs.wasNull() + ") " +
"col2=" + rs.getString(3) + " (null=" + rs.wasNull() + ") ");
}
rs.close();
// Add an unrelated comparison expression.
sql = "select regexp_extract(text, '\\w', 0), regexp_extract(text, '\\\\w', 0), text, text = 'a' from test.a;";
System.out.println("Running query 2: " + sql);
rs = stmt.executeQuery(sql);
while(rs.next()) {
System.out.println("col0=" + rs.getString(1) + " (null=" + rs.wasNull() + ") " +
"col1=" + rs.getString(2) + " (null=" + rs.wasNull() + ") " +
"col2=" + rs.getString(3) + " (null=" + rs.wasNull() + ") ");
}
rs.close();
}catch(SQLException se){
//Handle errors for JDBC
se.printStackTrace();
}catch(Exception e){
//Handle errors for Class.forName
e.printStackTrace();
}finally{
//finally block used to close resources
try{
if(stmt!=null)
conn.close();
}catch(SQLException se){
}// do nothing
try{
if(conn!=null)
conn.close();
}catch(SQLException se){
se.printStackTrace();
}
}
}
} I ran from the command line with: javac JDBCRegex.java && CLASSPATH=~/ClouderaImpalaJDBC-2.6.4.1005/ImpalaJDBC41.jar:. time java JDBCRegex
... View more
09-18-2018
09:33 AM
Thanks for letting us know about this, I'll see what I can do to get it fixed in a future release.
... View more
09-11-2018
11:16 AM
1 Kudo
We don't support this in Impala right now. We'd generally recommend using Hive to prepare such data for querying.
... View more
09-10-2018
02:30 PM
@BorjaRodriguez that looks like a different known issue that we've seen with incremental stats on tables with large numbers of columns. It's fixed in CDH5.13+.
... View more
09-06-2018
12:47 PM
1 Kudo
Untracked memory is really anything that isn't explicitly tracked by query execution. We track all of the large amounts of memory used by query execution - buffers for reading from disk, the actual row data, hash tables in joins, etc, etc. The untracked memory should be small relative to that - it's mostly overhead for control structures like the runtime profile and things like that. That's usually small but it could add up easily when there are lots of queries being left open. It looks like something is unhealthy there. For one, there are a lot of queries that are still hanging around. I'd guess that there's probably a client that is misbehaving and not closing queries once it is finished with them. Those queries look like they were probably cancelled (or had all the results fetched) but were not closed by the client. One workaround for problems like that is to set an idle session timeout to periodically clear out user sessions that are not active: https://www.cloudera.com/documentation/enterprise/latest/topics/impala_timeouts.html
... View more
09-06-2018
11:24 AM
@Tomas79Impala should definitely be able to stay under a query mem_limit by spilling to disk! I've put a lot of work into making that work more reliably over the last couple of years. I don't know what version you were experimenting with but it's gotten much better over time. We had incremental improvements in most releases from CDH5.5 -> CDH5.12, then a big improvement in CDH5.13 (there was a big rework of the spill-to-disk code), then I expect we'll see another big increase in reliability in CDH6.1 based on the work we've already done. My rules of thumb for what to expect are: CDH5.9+: spilling is fairly reliable if the query has plenty of memory (300MB+ for each spilling agg and join, adequate memory for other query operators). CDH5.13+: spilling of aggregation, hash join and sort (i.e. the most memory-hungry operators) are more reliable because those operators reserve memory for spilling and have lower minimum memory requirements. Other operators like SCAN need to have adequate memory to run in-memory. Memory limit exceeded should be rare if each query has a mem_limit of multiple GBs, e.g. 2GB+. CDH6.1+: it should be much harder to hit a "memory limit exceeded" after a query starts running - we've fixed a lot of edge cases, particularly in the SCAN operators. The most common reason I see for queries failing to spill is that there are concurrent queries running without mem_limits set or the mem_limits add up to more than the process mem_limit. This has gotten better (and 6.1 should have further improvements) but we do rely on memory-based admission control being configured with mem_limits set to completely avoiding OOM scenarios from multiple queries competing for the same memory.
... View more
08-23-2018
09:51 AM
We have a CDH Maven repository with the CDH version of things. CDH releases don't always align exactly with Apache releases - they may include additional bug fixes, features or may have non-production-ready features disabled. I believe these are the docs for how to reference them via maven: https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh5_maven_repo.html E.g. here is hadoop-core for 5.11.2: https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/hadoop/hadoop-core/2.6.0-mr1-cdh5.11.2/
... View more
08-21-2018
11:15 AM
I'd recommend using the versions matching your CDH installation.
... View more
08-21-2018
09:38 AM
Hi @Yurii thanks for the bug report - we'll look into it. What version of Impala did you see this on? One thing worth trying is changing the PARQUET_ARRAY_RESOLUTION query option to THREE_LEVEL https://www.cloudera.com/documentation/enterprise/latest/topics/impala_parquet_array_resolution.html We have sometimes seen similar symptoms with parquet nested types because of some ambiguity in encodings, e.g. https://issues.apache.org/jira/browse/IMPALA-4725. The original version of Impala's nested types defaulted to detecting an older representation of arrays first and we had to keep that behaviour in Impala 2.x/CDH5.x for backwards compatibility. In Impala 3.0/CDH6.0 we're defaulting to the standard array resolution method.
... View more
08-03-2018
09:50 AM
We looked at adding this a while back but ran into some technical issues around whether it should discover new partitions that would match the predicates: https://issues.apache.org/jira/browse/IMPALA-4105 We have a few people actively working on REFRESH/INVALIDATE and related issues, hoping to drastically improve the experience there.
... View more