About Tim Armstrong

Fawze · ‎05-11-2018

Hi @Tim Armstrong Hope you are doing well, It will be nice if we have a metirc for the memory part of the daemons_memory_limit used by impala daemon in a given time. So when i get a query failing on memory, i can investigate the memory usage thatwill help me to understand when to increase the limit, secondly, i can learn trend and usage over time and i can plan my increase. Currently i see only the resident memory per node but this memory isn't used by the queries, so it's a diffcult task for me to investigate the impala behaviour once a query failed on memory. Yes i have a metric of the total memory used by node, but i have different roles in the node, so it hard to track this issue.

medloh · ‎05-02-2018

Thanks Romainr, You pointed us in the right direction. I was setting it in several incorrect places in CM Impala and Hue. The place that finally worked was: Hue-->Configuration Hue Server Advanced Configuration Snippet (Safety Valve) for hue_safety_valve_server.ini [impala] query_timeout_s=86400

MattS · ‎05-01-2018

The reason you were seeing HdfsParquetTableWriter::ColumnWriter is that I was testing the bug using the syntax: CREATE TABLE db.newTable STORED AS PARQUET AS SELECT a.topLevelField, b.priceFromNestedField FROM db.table a LEFT JOIN a.nestedField This was purely to force the bug to occur - if you just did the SELECT in Hue it would often succeed because it only brings back the first 100 rows - to consistently trigger the crash I had to make Impala read from both Parquet files. No other query was running at the time. Anyway, as Chris says, the bug appears to be fixed in 5.14.2. The job which originally consistently triggered the crash has now been running unchanged over the same source data for 20 hours without hitch. Thanks for your help Matt

Tim Armstrong · ‎03-28-2018

@alpertankut current link is https://www.cloudera.com/documentation/enterprise/latest/topics/impala_analytic_functions.html#row_number

ludof · ‎03-28-2018

I've finally solved using the executeUpdate method: // invalidate metadata and rebuild index on Impala try { Statement stmt = impalaConn.createStatement(); try { String query = "INVALIDATE METADATA;"; int result = stmt.executeUpdate(query); while (resultSet.next()) { // do something } } finally { stmt.close(); } } catch(SQLException ex) { while (ex != null) { ex.printStackTrace(); ex = ex.getNextException(); } System.exit(1); } Thanks for help!

mauricio · ‎03-27-2018

Thanks very much Tim for looking up the JIRA. Yikes, been open since 2014. As John pointed out there, column order info must be in the metastore since hive show create table displays fine, so seems like this should be a simple change to how impala reads that info. Upvoted the JIRA.

Tim Armstrong · ‎03-26-2018

Are there any NULLs in idvar? If so, you could be getting tripped up by the interaction between NOT IN and NULL values. One interesting quirk of SQL is that in some cases IN and NOT IN can both be false for the same row and subquery. E.g. I can recreate a similar scenario if the only value in the subquery is a NULL. [localhost:21000] > select count(distinct int_col) from functional.alltypestiny; +-------------------------+ | count(distinct int_col) | +-------------------------+ | 2 | +-------------------------+ [localhost:21000] > select count(distinct int_col) from functional.alltypestiny t1 where int_col in (select distinct int_col from functional.alltypesagg where int_col is null); +-------------------------+ | count(distinct int_col) | +-------------------------+ | 0 | +-------------------------+ [localhost:21000] > select count(distinct int_col) from functional.alltypestiny t1 where int_col not in (select distinct int_col from functional.alltypesagg where int_col is null); +-------------------------+ | count(distinct int_col) | +-------------------------+ | 0 | +-------------------------+ I suspect it might be easier to understand if you use a NOT EXISTS. It is almost equivalent to NOT IN but the handling of NULL values is more intuitive. [localhost:21000] > select count(distinct int_col) from functional.alltypestiny t1 where not exists(select distinct int_col from functional.alltypesagg t2 where int_col is null and t1.int_col = t2.int_col); +-------------------------+ | count(distinct int_col) | +-------------------------+ | 2 | +-------------------------+ [localhost:21000] > select count(distinct int_col) from functional.alltypestiny t1 where exists(select distinct int_col from functional.alltypesagg t2 where int_col is null and t1.int_col = t2.int_col); +-------------------------+ | count(distinct int_col) | +-------------------------+ | 0 | +-------------------------+

Yeseswini · ‎02-22-2018

Thank You.It worked.

Tim Armstrong · ‎02-09-2018

@siddesh210491the simplest solution might be to set the safety valve globally as above. That will apply globally but may be a reasonable setting for clients other than hue too. Otherwise another option is to use the query_timeout_s query option. You can set a default value for that option (or any query options) if you have dynamic resource pools set up, with all Hue queries going into a pool. https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_resource_pools.html#concept_xkk_l1d_wr__impala_dynamic_pool_settings

Tim Armstrong · ‎02-05-2018

Gzip decompression will definitely use more CPU than snappy decompression, so I'd usually expect Gzip to give you worse performance, unless your query is limited by disk I/O (in which case smaller is better) or if your query isn't limited by scan performance.

Online	Offline
Last Visited	‎02-11-2021 06:07 PM

Member Since	‎07-29-2015 04:07 PM
Last Visited	‎02-11-2021 06:07 PM
Posts	535
Kudos received	141

Cloudera Community

Re: Impala Queries which were previously working a...

Re: Impala queries are not distributing to all the...

Re: impala - `recover partitions` points to old da...

Re: impala catalog server JVM

Re: Impala - On-demand metadata

Re: Getting impala daemon serves via cloudera rest...

Re: Query blah expired due to client inactivity (t...

Re: Impala critical bug in CDH 5.14.0

Re: Sequence number generation in impala

Re: Invalidate metadata using Cloudera Impala JDBC...

Re: Impala-over-hbase table has columns in alphabe...

Re: select not in not working

Re: class not found exception while running UDF in...

Re: Can the impalad "idle_query_timeout" parameter...

Re: Recommended file size for Impala Parquet files...