- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Is there any Impala SQL command which can remove table from Impala, not from Hive?
- Labels:
-
Apache Hive
-
Apache Impala
-
HDFS
Created on ‎01-04-2016 07:16 PM - edited ‎09-16-2022 02:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
It seems the memory used by catalogd depends on these based on the comment here.
- the number of HDFS files and blocks
- the number of databases, tables, and partitions.
This could mean that by removing unncessary tables from Impala, but not from Hive it can reduce memory usage in catalogd.
# Question
Is there any Impala SQL command which can remove tables from Impala, not from Hive?
Thank you again.
Created ‎01-05-2016 03:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That's strange and somewhat unexpected. My suggestion is not exactly a tested scenario and more of a side effect of our implementation of "invalidate metadata", so maybe there are issues I am not thinking of that would prevent the objects being cleanup up by the Java GC.
After doing the "invalidate metadata", are you sure the table is not being accessed? To verify the state of metadata loading you can go to the catalogd Web UI (default port 25020) and inspect the contents of your table metadata via the /catalog tab.
Make sure that you are starting the catalogd with --load_catalog_in_background=false, but I assume that's already the case since it's the default.
Yes, upon first access of a table, the catalogd will load and cache the metadata for that table.
Created ‎01-06-2016 12:15 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Correct, you can use jmap to get heapdumps and jhat or other heap dump analysis tools to read the dumps. I'd recommend trying a few heap analysis tools to see which one you like.
Created ‎01-04-2016 10:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Since both Impala's and Hive's Metadata are backed by the Hive Metastore, you cannot completely remove a table from only one or the other.
By default, Impala loads the metadata of tables lazily, i.e., only when a table is accessed in Impala. After the initial loading, the table metadata is cached in the catalogd and impalads.
If your goal is to reduce the memory burden on the catalogd, then you can call "invalidate metadata <table_name>" in Impala on those tables you want to "remove" from Impala. This will replace the full metadata for that table with a "dummy" table entry which uses an insignificant amount of memory. However, if you access that table again in Impala, then the metadata will be loaded again. So as long as you don't access those tables whose metadata has not been loaded, you are not using much memory.
Created ‎01-04-2016 11:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Alex,
Let me repeat what you explained to make sure if I understand correctly.
You are saying that if I run `INVALIDATE METADATA Table1;`
The cached metadata for Table1 in catalogd will be replaced with the metadata of a 'dummy' table which uses insignificant amount of memory.
In another words, catalogd releases memory used for the full metadata of Table1.
And, the full metadata of Table1 will be loaded when the table is accssed.
I knew that `INVALIDATE METADATA`marks the metadata for Table1 as stable.
However, I didn't know that the command replaces the cached metadata. 🙂
Am I understanding what you explained correctly?
----
"INVALIDATE METADATA causes the metadata for that table to be marked as stale, and reloaded the next time the table is referenced."
---
Created on ‎01-05-2016 11:33 AM - edited ‎01-05-2016 11:38 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Alex,
I ran `INVALIDATE MEATADATA Table1`, but it doesn't like `catalogd` releases memory.
Table1 I used for testing has 26k partitions and the size is 4.1T ( 12.2T ).
I also did one more testing, restarting `catalogd`.
Before restarting `catalogd`, the memorey usage was 25G VIRT and 13G RES.
After restarting, the memory usage of `catalogd` became 10G VIRT and less than 1G RES.
And, I accessed the Table1 and I could see that `catalogd` started using more memory.
I guess it might be because `catalogd` started caching the full metadata for Table1.
Any comment?
Thank you 🙂
Created ‎01-05-2016 03:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That's strange and somewhat unexpected. My suggestion is not exactly a tested scenario and more of a side effect of our implementation of "invalidate metadata", so maybe there are issues I am not thinking of that would prevent the objects being cleanup up by the Java GC.
After doing the "invalidate metadata", are you sure the table is not being accessed? To verify the state of metadata loading you can go to the catalogd Web UI (default port 25020) and inspect the contents of your table metadata via the /catalog tab.
Make sure that you are starting the catalogd with --load_catalog_in_background=false, but I assume that's already the case since it's the default.
Yes, upon first access of a table, the catalogd will load and cache the metadata for that table.
Created ‎01-05-2016 03:13 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Alex,
After executing `INVALIDATE METADATA Table1`, I checkd catalogd UI to see what's been cached.
Like you mentioned before the cached metadata for Table1 is replaced with a `dummy` table like this.
TCatalogObject { 01: type (i32) = 3, 02: catalog_version (i64) = 733, 05: table (struct) = TTable { 01: db_name (string) = "default", 02: tbl_name (string) = "Table1", 04: id (i32) = 730, }, }
However, it didn't contribute to reducing memory 🙂
And, Yes, catalogd starts `--load_catalog_in_background=false`
Thank you.
Created ‎01-05-2016 04:14 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry to hear it did not help to reduce the memory consumption. I'm not really sure why that would be the case. If you want to investigate further, I'd recommend getting heap dumps of the Java process before and after the "invalidate metadata" to see where the memory is going.
Created ‎01-05-2016 05:27 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Alex,
No problem at all. 🙂
At least, I learned that the metadata of Table1 in catalogd is replaced to the metadata of a `dummy` table. 🙂
BTW, I don't know much about investigating Heap memory in Java process.
However, since I always do sth before asking further questions I did some investigation.
I googled and found a command like `jmap`. ( I'm not sure if this is the right one )
`jmap` requires PID of java process, but I don't know which PID I need to use.
( on my machines there are several Java processes like HBase, MR2, etc. )
In order to try `jmap`, I used the PID of `catalogd` and checked before/after
`INVALIDATE METADATA Table1` and `SELECT * FROM Table1 WHERE yearmonth=201512 LIMIT 1`
./jmap -heap 4713
Am I doing right way?
Created ‎01-06-2016 12:15 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Correct, you can use jmap to get heapdumps and jhat or other heap dump analysis tools to read the dumps. I'd recommend trying a few heap analysis tools to see which one you like.
Created ‎01-06-2016 08:49 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
