Member since
07-29-2015
535
Posts
140
Kudos Received
103
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5802 | 12-18-2020 01:46 PM | |
3736 | 12-16-2020 12:11 PM | |
2647 | 12-07-2020 01:47 PM | |
1893 | 12-07-2020 09:21 AM | |
1225 | 10-14-2020 11:15 AM |
05-23-2016
08:44 AM
Yes, that's right. It's enabled for some cases by default (broadcast joins) in Impala 2.5. To enable it for a wider category of joins you can set the query option runtime_filter_mode=global. This setting will become the default in Impala 2.7 because of the performance benefits.
... View more
05-02-2016
09:19 PM
There's no direct way to find out from the profile unfortunately. If you have a live system you can look at the /threadz page on the impala debug web page (port 25000 on each Impala daemon by default) to see how many hdfs-scan-node threads are running.
... View more
04-25-2016
06:51 AM
Thank you very much, Tim and Ivan. I have tested on customer system and with "set disable_codegen=1;" the query does NOT crash the whole impala cluster anymore. I will inform the customer and provide him with a link to this discussion
... View more
04-22-2016
09:40 AM
Actually it is datanode doing it. I guess I'll ask more about it as an HDFS topic. Thanks!
... View more
02-23-2016
08:39 PM
Hi, There are many possible variables, including the exact version of impala, the operating system it was built on, the build flags and environment variables, and what version/build of dependencies you're using. I think the specific thing you're probably seeing with file sizes is that in the CDH distribution the debug symbols are stripped from the binaries and included in separate impalad.debug files. Are you running into some error when trying to run your custom build of Impala? It probably makes more sense to debug that problem rather than trying to exactly reproduce Cloudera's build.
... View more
01-27-2016
04:09 PM
You are most likely running into this bug with the aggregation: https://issues.cloudera.org/browse/IMPALA-2352 We fixed it in CDH5.5/Impala 2.3 but the change wasn't backported because it was deemed too risky for a maintenance release.
... View more
01-20-2016
11:50 AM
You can create a 1-row dummy table like this: select 1 id, 'a' d from (select 1) dual where 1 = 1 You also have to rewrite the query to avoid an uncorrelated not exists. You can do something like: select 1 id, 'a' d from (select 1) dual where (select count(*) from employee where empid > 20000) = 0 Computing the count might be expensive so you could add a limit like select 1 id, 'a' d from (select 1) dual where (select count(*) from (select id from employee where empid > 20000 limit 1) emp) = 0
... View more
01-18-2016
09:27 AM
1 Kudo
If you can switch to Parquet, that's probably the best solution: it's generally the most performant file format for reading and produces the smallest file sizes. If for some reason you need to stick with text, the uncompressed data size needs to be < 1GB per file.
... View more
12-27-2015
07:23 PM
The advice in this thread is out of date: memory usage for joins and aggregations has been improved a lot in CDH5.5. Your issue is something different since the query doesn't have a join or group by (aggregation) in it. The first step to understand this better is to look at the impalad logs: there is usually some information in there about why the memory limit was exceeded and what operators were consuming memory.
... View more
12-18-2015
09:17 AM
1 Kudo
I misread your question and didn't realise you wanted the per-host peak, PerHostPeakMemUsage gives you exactly what you want.
... View more
- « Previous
- Next »