Member since
12-13-2013
39
Posts
8
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5394 | 10-10-2017 08:05 PM |
07-29-2019
03:46 PM
Thanks very much Tim, I can confirm that it works like a charm, even with the group by, so yeah docs should be updated because that does add a lot of value vs. what was documented. P.S. I didn't get an email when you first replied, only yesterday with the latest ones. thanks for the quick response.
... View more
07-24-2019
04:06 PM
We have a slow query like: select max(partition_col_1) from some_table where partition_col_2 = 'x' and it's super slow, scanning all records (hundreds of billions) in the filtered partitions, even though it's actually not getting anything out of them... select only includes partitioning column. Absolutely no need to read any files I don't think. Any way or hint to get around this?
... View more
Labels:
- Labels:
-
Apache Impala
04-18-2019
11:24 AM
1 Kudo
I just wanted to add to Todd's suggestion: also if you have CM, you can create a new chart with this query: "select total_kudu_on_disk_size_across_kudu_replicas where category=KUDU_TABLE", and it will plot all your table sizes, plus the graph detail will list current values for all entries. Probably not easily scriptable, but at least a way to quickly copy all sizes in one go, looking like this: 7.2T impala::<tablename_redacted> (Kudu) 9.8T impala::<tablename_redacted> (Kudu) 6.5T impala::<tablename_redacted> (Kudu) 4.1G impala::<tablename_redacted> (Kudu) 21.5G impala::<tablename_redacted> (Kudu) 15.2G impala::<tablename_redacted> (Kudu) 6.1T impala::<tablename_redacted> (Kudu) 98G impala::<tablename_redacted> (Kudu) 23.2G impala::<tablename_redacted> (Kudu) 10G impala::<tablename_redacted> (Kudu) 9.1G impala::<tablename_redacted> (Kudu) 1.2T impala::<tablename_redacted> (Kudu) 7.5G impala::<tablename_redacted> (Kudu) 2.6T impala::<tablename_redacted> (Kudu) 35.8T impala::<tablename_redacted> (Kudu)
... View more
12-13-2018
04:01 PM
For Impala, hbase, hdfs and yarn services, I can specify memory allocation in static service pools UI in Cloudera Manager. Not so for Kudu.
So do I manually under-allocate all the others to leave open whatever I want for Kudu, and then config Kudu's "Tablet Server Hard Memory Limit"?
CDH 5.15.1 CM 5.15.1
... View more
Labels:
- Labels:
-
Apache Kudu
-
Cloudera Manager
06-12-2018
04:46 PM
Never mind my last comment: I was confused because the DISABLE_CODEGEN_ROWS_THRESHOLD setting @Tim Armstrong recommended was not yet documented, so tried using the closest thing I found (SCAN_NODE_CODEGEN_THRESHOLD) which wasn't applicable to our query. Turns out even though not yet documented, DISABLE_CODEGEN_ROWS_THRESHOLD is available and works as Tim suggested, in our CDH 5.13 cluster.
... View more
06-07-2018
10:05 AM
FYI @Tim Armstrong : sadly, setting SCAN_NODE_CODEGEN_THRESHOLD, to any value, had no effect, perhaps since as I mentioned above the slow codegen is NOT in a scan node but a TOP-N towards the end of processing. We are considering setting DISABLE_CODEGEN=false on the url for this connection alone (specific to user reports), though we'd need to watch carefully to make sure it doesn't make other reports slow. We'll probably also open a case with our EDH support to try to get to the bottom of why it's slow to begin with.
... View more
06-06-2018
09:32 AM
Thanks @Tim Armstrong. Hmm I can't find that option in the current docs, is it just undocumented? Or do you mean SCAN_NODE_CODEGEN_THRESHOLD ? because there is at least 1 node (from an often used dimension that will apply to most queries) where rows estimate is 2.6 million (though after filtering it becomes only a few). And also even if all scans are under 400K or whatever we set it to, will it help here considering the slow codegen is in a TOP-N step towards the end? Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
...
03:SCAN HDFS 30 48.332ms 103.898ms 17 2.60M 10.93 MB 192.00 MB irdw_prod.media_dim md
... View more
06-05-2018
03:09 PM
Yeah we definitely wouldn't want to do globally. We tried to do set DISABLE_CODEGEN=true; right before our sql in the report but driver fails with a [Simba][JDBC](11300) A ResultSet was expected but not generated which is really sad, I had thought we could specify any of these hints right in the sql. Doing so in the jdbc url is not an option because same connection is shared by all of our thousands of reports, only 10% or so of which are affected by this. @Tim Armstrong I tried to guess your Cloudera email and sent you the profile directly.
... View more
06-05-2018
10:04 AM
Thanks, right I know I can do that but I'm hoping to figure out the root cause rather than paper over it. Plus it makes me nervous to do so for a whole class of queries/reports.. that doc page does say "... Do not otherwise run with this setting turned on, because it results in lower overall performance.
... View more