Member since
07-29-2015
535
Posts
140
Kudos Received
103
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6053 | 12-18-2020 01:46 PM | |
3931 | 12-16-2020 12:11 PM | |
2782 | 12-07-2020 01:47 PM | |
1988 | 12-07-2020 09:21 AM | |
1277 | 10-14-2020 11:15 AM |
06-23-2016
06:42 PM
I don't think it's on the immediate roadmap, our focus recently has been on various other things (performance, Amazon EC2 support, etc)
... View more
06-23-2016
01:05 PM
That feature has not made it in unfortunately. The documentation at http://www.cloudera.com/documentation.html is the source of truth about what features are or are not present.
... View more
06-17-2016
04:04 PM
Ok, that's interesting. The nonsense-looking symbol up the top is probably jitted code from your query, probably an expression or something like that. 36.50% perf-18476.map [.] 0x00007f3c1d634b82 The other symbols like GetDoubleVal() may be what is calling this expensive function. It looks like it's possible ProbeTime in the profile that's the culprit. Can you share the SQL for your query at all? I'm guessing that there's some expression in your query that's expensive to evaluate. E.g. joining on some complex expression, or doing some kind of expensive computation.
... View more
06-17-2016
11:23 AM
Maybe run 'perf top' to see where it's spending the time? I'd expect the scan to run on one core and the join and insert to run on a different core.
... View more
06-17-2016
09:19 AM
There's something strange going on here, the profile reports that the scan took around 12 seconds of CPU time, but 17 minutes of wall-clock time. So for whatever reason the scan is spending most of its time swapped out and unable to execute. - MaterializeTupleTime(*): 17m20s - ScannerThreadsSysTime: 74.049ms - ScannerThreadsUserTime: 12s312ms Is the system under heavy load or is it swapping to disk?
... View more
05-23-2016
08:44 AM
Yes, that's right. It's enabled for some cases by default (broadcast joins) in Impala 2.5. To enable it for a wider category of joins you can set the query option runtime_filter_mode=global. This setting will become the default in Impala 2.7 because of the performance benefits.
... View more
05-02-2016
09:21 PM
We often use TPC-H and TPC-DS, they're pretty standard for analytical databases. There's a TPC-DS kit for Impala here: https://github.com/cloudera/impala-tpcds-kit
... View more
05-02-2016
09:19 PM
There's no direct way to find out from the profile unfortunately. If you have a live system you can look at the /threadz page on the impala debug web page (port 25000 on each Impala daemon by default) to see how many hdfs-scan-node threads are running.
... View more
04-29-2016
10:59 AM
Impala limits the number of threads executing the query plan by design. Impala dynamically increases the number of scanner threads provided there are CPU and memory resources available - in this case it seems like there weren't CPU resource available. If the machine is already busy adding more threads can actually decrease query throughput.
... View more
04-28-2016
08:50 AM
The algorithm is described in the Impala source code here if you (or anyone else reading) is interested: https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/exec/partitioned-hash-join-node.h
... View more