Member since
12-30-2015
73
Posts
3
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1549 | 02-14-2020 11:38 PM | |
1576 | 02-13-2020 02:08 PM | |
2914 | 02-04-2020 10:14 PM | |
2278 | 01-26-2017 10:38 AM |
01-13-2016
11:10 PM
Ah, I got it. Tmr, I will try to compare the difference between first and second run by checking exec summary. MANY Thank you.
... View more
01-13-2016
10:59 PM
Thank you, 1. Any plan to support bucketed tables? 2. Yeap. I remember what you said about catalogd loading metadata. However, my question was that how to avoid to hit Impala cache if Impala caches query which ran before. The reason why I'm asking this question is that I saw some difference in performace. For example, let's say catalogd already has metadata for `Table1` because I already wamp up the table by running `SELECT query`. After this, I run two different query like this. I run QUERY1 - this takes 15sec I run QUERY2 - this takes 20sec I run QUERY1 again - this takes 6sec I run QUERY2 again - this takes 10sec How should I interpret this difference?
... View more
01-13-2016
08:20 PM
Alex, I have a question. Like I said, one of the tables I have 31k partitions. ( Table1 ) It seems the table performs ok. However, since you said that more than 10k is not recommended per table, I have thought reducing the partitions. Recently, I have read about bucketing in Hive and generated a testing table with partition for the first level and bucket for the secend level. BTW, I got this warning from Impala. WARNINGS: For better performance, snappy, gzip and bzip-compressed files should not be split into multiple hdfs-blocks. file=/user/hive/external_warehouse/test_table/yrmonth=201512/000151_0.snappy offset 134217728 Here are my questions. Question 1: In general, how is Impala query performace with Hive bucketing? Question 2: How to avoid to use cached Impala query? I'd like to do some performance testing with this new table. However, if I run the same query two times, it seems the second query use the cached data by the prev. one. Thank you
... View more
01-06-2016
08:49 AM
thx 🙂
... View more
01-06-2016
12:20 AM
Alex, Thank you very much. 🙂 -- Moonwon (Gatsby) Lee gatsbylee.com "Life isn't about waiting for the storm to pass, it's about learning to dance in the rain."
... View more
01-05-2016
05:27 PM
Alex, No problem at all. 🙂 At least, I learned that the metadata of Table1 in catalogd is replaced to the metadata of a `dummy` table. 🙂 BTW, I don't know much about investigating Heap memory in Java process. However, since I always do sth before asking further questions I did some investigation. I googled and found a command like `jmap`. ( I'm not sure if this is the right one ) `jmap` requires PID of java process, but I don't know which PID I need to use. ( on my machines there are several Java processes like HBase, MR2, etc. ) In order to try `jmap`, I used the PID of `catalogd` and checked before/after `INVALIDATE METADATA Table1` and `SELECT * FROM Table1 WHERE yearmonth=201512 LIMIT 1` ./jmap -heap 4713 Am I doing right way?
... View more
01-05-2016
03:13 PM
Alex, After executing `INVALIDATE METADATA Table1`, I checkd catalogd UI to see what's been cached. Like you mentioned before the cached metadata for Table1 is replaced with a `dummy` table like this. TCatalogObject {
01: type (i32) = 3,
02: catalog_version (i64) = 733,
05: table (struct) = TTable {
01: db_name (string) = "default",
02: tbl_name (string) = "Table1",
04: id (i32) = 730,
},
} However, it didn't contribute to reducing memory 🙂 And, Yes, catalogd starts `--load_catalog_in_background=false` Thank you.
... View more
01-05-2016
02:02 PM
W0105 16:59:22.911566 28206 PlanNode.java:545] overflow when multiplying cardinalities: 9223372036854775807, 9
W0105 16:59:22.909112 28206 PlanNode.java:545] overflow when multiplying cardinalities: 86038345052864512, 75227480
W0105 16:59:22.897722 28206 PlanNode.java:545] overflow when multiplying cardinalities: 2389953856178920, 12537913
W0105 16:59:22.886450 28206 PlanNode.java:545] overflow when multiplying cardinalities: 2389953856178920, 12537913 Hell, I started getting this message. How should I interpret this msg in log? #udpated I have found this from Google Group commented by Martin Grund. """ It happens when multiplying to Java long values results in an overflow, see https://github.com/cloudera/Impala/blob/cdh5-trunk/fe/src/main/java/com/cloudera/impala/planner/PlanNode.java#L541. Cardinalities are used in optimizing the plan and for example for joins table cardinalities are multiplied and may result in very larger values. If the multiplication overflows, we use Long.MAX_VALUE. """ #Question: Is there anything I need to do to prevent this issue? Thank you Gatsby
... View more
Labels:
- Labels:
-
Apache Impala
01-05-2016
11:33 AM
Alex, I ran `INVALIDATE MEATADATA Table1`, but it doesn't like `catalogd` releases memory. Table1 I used for testing has 26k partitions and the size is 4.1T ( 12.2T ). I also did one more testing, restarting `catalogd`. Before restarting `catalogd`, the memorey usage was 25G VIRT and 13G RES. After restarting, the memory usage of `catalogd` became 10G VIRT and less than 1G RES. And, I accessed the Table1 and I could see that `catalogd` started using more memory. I guess it might be because `catalogd` started caching the full metadata for Table1. Any comment? Thank you 🙂
... View more
01-04-2016
11:31 PM
Alex, Let me repeat what you explained to make sure if I understand correctly. You are saying that if I run `INVALIDATE METADATA Table1;` The cached metadata for Table1 in catalogd will be replaced with the metadata of a 'dummy' table which uses insignificant amount of memory. In another words, catalogd releases memory used for the full metadata of Table1. And, the full metadata of Table1 will be loaded when the table is accssed. I knew that `INVALIDATE METADATA`marks the metadata for Table1 as stable. However, I didn't know that the command replaces the cached metadata. 🙂 Am I understanding what you explained correctly? ---- "INVALIDATE METADATA causes the metadata for that table to be marked as stale, and reloaded the next time the table is referenced." - from http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/impala_invalidate_metadata.html ---
... View more