Member since
07-29-2015
535
Posts
141
Kudos Received
103
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
7627 | 12-18-2020 01:46 PM | |
4992 | 12-16-2020 12:11 PM | |
3804 | 12-07-2020 01:47 PM | |
2476 | 12-07-2020 09:21 AM | |
1615 | 10-14-2020 11:15 AM |
05-11-2018
02:51 PM
Hi @Tim Armstrong Hope you are doing well, It will be nice if we have a metirc for the memory part of the daemons_memory_limit used by impala daemon in a given time. So when i get a query failing on memory, i can investigate the memory usage thatwill help me to understand when to increase the limit, secondly, i can learn trend and usage over time and i can plan my increase. Currently i see only the resident memory per node but this memory isn't used by the queries, so it's a diffcult task for me to investigate the impala behaviour once a query failed on memory. Yes i have a metric of the total memory used by node, but i have different roles in the node, so it hard to track this issue.
... View more
05-02-2018
02:16 PM
Thanks Romainr, You pointed us in the right direction. I was setting it in several incorrect places in CM Impala and Hue. The place that finally worked was: Hue-->Configuration Hue Server Advanced Configuration Snippet (Safety Valve) for hue_safety_valve_server.ini [impala] query_timeout_s=86400
... View more
05-01-2018
01:28 AM
The reason you were seeing HdfsParquetTableWriter::ColumnWriter is that I was testing the bug using the syntax: CREATE TABLE db.newTable STORED AS PARQUET AS SELECT a.topLevelField, b.priceFromNestedField FROM db.table a LEFT JOIN a.nestedField This was purely to force the bug to occur - if you just did the SELECT in Hue it would often succeed because it only brings back the first 100 rows - to consistently trigger the crash I had to make Impala read from both Parquet files. No other query was running at the time. Anyway, as Chris says, the bug appears to be fixed in 5.14.2. The job which originally consistently triggered the crash has now been running unchanged over the same source data for 20 hours without hitch. Thanks for your help Matt
... View more
03-28-2018
09:43 AM
@alpertankut current link is https://www.cloudera.com/documentation/enterprise/latest/topics/impala_analytic_functions.html#row_number
... View more
03-28-2018
12:19 AM
2 Kudos
I've finally solved using the executeUpdate method: // invalidate metadata and rebuild index on Impala
try {
Statement stmt = impalaConn.createStatement();
try {
String query = "INVALIDATE METADATA;";
int result = stmt.executeUpdate(query);
while (resultSet.next()) {
// do something
}
}
finally {
stmt.close();
}
}
catch(SQLException ex) {
while (ex != null)
{
ex.printStackTrace();
ex = ex.getNextException();
}
System.exit(1);
} Thanks for help!
... View more
03-27-2018
01:57 PM
Thanks very much Tim for looking up the JIRA. Yikes, been open since 2014. As John pointed out there, column order info must be in the metastore since hive show create table displays fine, so seems like this should be a simple change to how impala reads that info. Upvoted the JIRA.
... View more
03-26-2018
02:47 PM
Are there any NULLs in idvar? If so, you could be getting tripped up by the interaction between NOT IN and NULL values. One interesting quirk of SQL is that in some cases IN and NOT IN can both be false for the same row and subquery. E.g. I can recreate a similar scenario if the only value in the subquery is a NULL. [localhost:21000] > select count(distinct int_col) from functional.alltypestiny;
+-------------------------+
| count(distinct int_col) |
+-------------------------+
| 2 |
+-------------------------+
[localhost:21000] > select count(distinct int_col) from functional.alltypestiny t1 where int_col in (select distinct int_col from functional.alltypesagg where int_col is null);
+-------------------------+
| count(distinct int_col) |
+-------------------------+
| 0 |
+-------------------------+
[localhost:21000] > select count(distinct int_col) from functional.alltypestiny t1 where int_col not in (select distinct int_col from functional.alltypesagg where int_col is null);
+-------------------------+
| count(distinct int_col) |
+-------------------------+
| 0 |
+-------------------------+ I suspect it might be easier to understand if you use a NOT EXISTS. It is almost equivalent to NOT IN but the handling of NULL values is more intuitive. [localhost:21000] > select count(distinct int_col) from functional.alltypestiny t1 where not exists(select distinct int_col from functional.alltypesagg t2 where int_col is null and t1.int_col = t2.int_col);
+-------------------------+
| count(distinct int_col) |
+-------------------------+
| 2 |
+-------------------------+
[localhost:21000] > select count(distinct int_col) from functional.alltypestiny t1 where exists(select distinct int_col from functional.alltypesagg t2 where int_col is null and t1.int_col = t2.int_col);
+-------------------------+
| count(distinct int_col) |
+-------------------------+
| 0 |
+-------------------------+
... View more
02-22-2018
01:19 AM
Thank You.It worked.
... View more
02-09-2018
03:36 PM
@siddesh210491the simplest solution might be to set the safety valve globally as above. That will apply globally but may be a reasonable setting for clients other than hue too. Otherwise another option is to use the query_timeout_s query option. You can set a default value for that option (or any query options) if you have dynamic resource pools set up, with all Hue queries going into a pool. https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_resource_pools.html#concept_xkk_l1d_wr__impala_dynamic_pool_settings
... View more
02-05-2018
04:18 PM
Gzip decompression will definitely use more CPU than snappy decompression, so I'd usually expect Gzip to give you worse performance, unless your query is limited by disk I/O (in which case smaller is better) or if your query isn't limited by scan performance.
... View more