Member since
12-30-2015
73
Posts
3
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1502 | 02-14-2020 11:38 PM | |
1535 | 02-13-2020 02:08 PM | |
2811 | 02-04-2020 10:14 PM | |
2237 | 01-26-2017 10:38 AM |
01-04-2016
07:16 PM
Hello, It seems the memory used by catalogd depends on these based on the comment here. ( https://community.cloudera.com/t5/Interactive-Short-cycle-SQL/How-to-predict-how-much-memory-catalogd-needs/m-p/35735#U35735 ) - the number of HDFS files and blocks - the number of databases, tables, and partitions. This could mean that by removing unncessary tables from Impala, but not from Hive it can reduce memory usage in catalogd. # Question Is there any Impala SQL command which can remove tables from Impala, not from Hive? Thank you again.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
-
HDFS
01-04-2016
07:03 PM
Alex, Thank you so much for your time and explanation. All your comments are really valuable to me 🙂 Thank you again.
... View more
01-04-2016
05:18 PM
following with the https://community.cloudera.com/t5/Interactive-Short-cycle-SQL/why-show-column-stats-lt-table-name-gt-doesn-t-show-statistics/m-p/35681#U35681 post.
... View more
01-04-2016
05:14 PM
Alex, Thank you for your reply. I am aware of what you explained on your reply about `COMPUTE INCREMENTAL STATS`. I have a question about the scenarios where data changes in a few partitions. What do you mean by few partitions? In my table, there are two level partitions, such as yearmonth/name_prefix . Every month, new yearmonth is added and the new yearmonth has around 1360 name_prefix partition. In this case, can I say there are a few partitions? And, you mentioned that I can do `COMPUTE INCREMENTAL STATS` for specific partitions. With this approach, I have to specify two level partitions with constant value, for example (yearmonth=201512, name_prefix=ae). This means that I need to execute `COMPUTE INCREMENTAL STATS` around 1300 times since there are 1300 name_prefix partitions under yearmonth=201512. I have seen a open ticket related to this. ( https://issues.cloudera.org/browse/IMPALA-1570 ) Maybe, after this ticket is resolved, I can do `COMPUTE INCREMENTAL STATS` only with the first level partition like (yearmonth=201512, name_prefix) Question 1: Based on what you said, the way I use `COMPUTE INCREMENTAL STATS` doesn't work. It just do full computation. Am I right ? Question 2: With the current use case I have, is it better to use `COMPUTE STATS`? Question 3: While `COMPUTE STATS` operation, if user access the table being under `COMPUTE STATS`, what is the expected behaviors in Impala? I really appreciate to your help 🙂
... View more
01-04-2016
04:53 PM
Thx. 🙂
... View more
12-31-2015
11:41 AM
Hello, I have question about memory used by catalogd. Here is some info. - 700 tables on Impala - 32G physical memory on NameNode - impala-server is not running on NameNode ( I stopped it since it uses too much memory. ) - catalogd's memory usage ( VIRT: 25.2G, RES: 16G ) My qusetion is like these. - How to predict how much memory catalogd need? - What factores can contribute to the memory usages of catalogd? - Is there any way to fource catalogd to release or flush memory? Thank you Gatsby
... View more
Labels:
- Labels:
-
Apache Impala
12-31-2015
10:50 AM
What does the warning mean? I have a table having two level partitions like yearmonth and name_prefix. Every month, I add new data with new yearmonth partition. Once ETL is finished, I run `COMPUTE INCREMENTAL STATS`. It takes about 2hr and it seems to take longer and longer. Based on what I read, `COMPUTE INCREMENTAL STATS` only need to gather statistics for the partitions having false value for `Incremental stats`. However, the time I spend for running `COMPUTE STATS` and `COMPUTE INCREMENTAL STATS` is very similar. Am I missing anything? Thank you Gatsby +---------------------------------------------+
| summary |
+---------------------------------------------+
| Updated 1372 partition(s) and 29 column(s). |
+---------------------------------------------+
WARNINGS: Too many partitions selected, doing full recomputation of incremental stats
Fetched 1 row(s) in 5871.22s
... View more
12-30-2015
04:27 PM
Thank you for your investigation. I am using `COMPUTE INCREMENTAL STATS` for the table which I add new yearmonth partition every month. However, `COMPUTE INCREMENTAL STATS` takes about 2hr and it takes longer and longer. To me, it is because `COMPUTE INCREMENTAL STATS` does two operation for the given table. 1. counting the total number of rows. 2. doing full recomputation of incremental stats if too many partitions are selected. I'm looking for some other ways to reduce time for gathering statistics. This is log I get during `COMPUTE INCREMENTAL STATS` +---------------------------------------------+
| summary |
+---------------------------------------------+
| Updated 1372 partition(s) and 29 column(s). |
+---------------------------------------------+
WARNINGS: Too many partitions selected, doing full recomputation of incremental stats
Fetched 1 row(s) in 5871.22s again, thank you for your help
... View more
12-30-2015
04:10 PM
Thank you so much 🙂
... View more
12-30-2015
03:45 PM
Thank you for your feedback. Yes. I did `INVALIDATE METADATA <table_name>`; As I mentioned in the questoin, I can see the updated table statistics by `show table stats <table_name>` after executing `ANALYZE TABLE Table1 PARTITION(yearmonth) COMPUTE STATISTICS;` Query: show table stats products
+-----------+----------+--------+----------+--------------+-------------------+---------+-------------------+
| yearmonth | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats |
+-----------+----------+--------+----------+--------------+-------------------+---------+-------------------+
| 201404 | 48799023 | 9 | 866.37MB | NOT CACHED | NOT CACHED | RC_FILE | false |
| 201405 | 54633812 | 11 | 968.30MB | NOT CACHED | NOT CACHED | RC_FILE | false |
| 201406 | 49516351 | 11 | 873.53MB | NOT CACHED | NOT CACHED | RC_FILE | false |
| 201407 | 51782891 | 12 | 934.59MB | NOT CACHED | NOT CACHED | RC_FILE | false |
| 201408 | 73191206 | 13 | 1.23GB | NOT CACHED | NOT CACHED | RC_FILE | false |
| 201409 | 76577223 | 17 | 1.31GB | NOT CACHED | NOT CACHED | RC_FILE | false |
| 201410 | 78462480 | 17 | 1.37GB | NOT CACHED | NOT CACHED | RC_FILE | false |
| 201411 | 78778172 | 17 | 1.37GB | NOT CACHED | NOT CACHED | RC_FILE | false |
| 201412 | 78778304 | 17 | 1.37GB | NOT CACHED | NOT CACHED | RC_FILE | false |
| 201501 | 78914761 | 17 | 1.37GB | NOT CACHED | NOT CACHED | RC_FILE | false |
| 201502 | 79112909 | 17 | 1.36GB | NOT CACHED | NOT CACHED | RC_FILE | false |
| 201503 | 79270403 | 17 | 1.37GB | NOT CACHED | NOT CACHED | RC_FILE | false |
| 201504 | 79315850 | 17 | 1.37GB | NOT CACHED | NOT CACHED | RC_FILE | false |
| 201505 | 79626491 | 17 | 1.37GB | NOT CACHED | NOT CACHED | RC_FILE | false |
| 201506 | 79644598 | 17 | 1.42GB | NOT CACHED | NOT CACHED | RC_FILE | false |
| 201507 | 79741074 | 17 | 1.42GB | NOT CACHED | NOT CACHED | RC_FILE | false |
| 201508 | 79798934 | 18 | 1.43GB | NOT CACHED | NOT CACHED | RC_FILE | false |
| 201509 | 79920969 | 18 | 1.43GB | NOT CACHED | NOT CACHED | RC_FILE | false |
| 201510 | 79950252 | 18 | 1.43GB | NOT CACHED | NOT CACHED | RC_FILE | false |
| 201511 | 79965601 | 18 | 1.44GB | NOT CACHED | NOT CACHED | RC_FILE | false |
| 201512 | 80038743 | 5 | 1.41GB | NOT CACHED | NOT CACHED | RC_FILE | false |
| Total | -1 | 320 | 27.03GB | 0B | | | |
+-----------+----------+--------+----------+--------------+-------------------+---------+-------------------+ However, I can't see the updated column statistics by `show column stats <table_name>` after executing `ANALYZE TABLE Table1 COMPUTE STATISTICS FOR COLUMNS;` Query: show column stats products
+-----------------+--------+------------------+--------+----------+----------+
| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
+-----------------+--------+------------------+--------+----------+----------+
| product_name | STRING | -1 | -1 | -1 | -1 |
| token_count | INT | -1 | -1 | 4 | 4 |
| currency | STRING | -1 | -1 | -1 | -1 |
| yearmonth | INT | 21 | 0 | 4 | 4 |
+-----------------+--------+------------------+--------+----------+----------+ Of course, I execute `INVALIDATE METADATA <table_name>` statistics in Impala. Thank you.
... View more
- « Previous
- Next »