Created 05-21-2025 07:00 AM
Hello.
We have ORC tables loaded from Hive (Tez) and queried from Impala. Due to their number of partitions, we had to increase the TEZ AM task's heap, and the note https://my.cloudera.com/knowledge/Tez-AMtask-fails-with-OOME-because-of-huge-impala-stats-chunk?id=3... seems to match our issue
We'd like to reduce the AM memory footprint, but the solution doesn't work in our 7.1.9 and 7.3.1 environments:
* After setting the option in the hive-site.xml, the "impala_intermediate_stats_chunk*" messages are still present in the Hive's plans.
* Setting the option at the session level returns the error "Cannot modify hive.plan.mapwork.serialization.skip.properties at runtime. It is not in the list of params that are allowed to be modified at runtime (state=42000,code=1)".
* https://github.com/apache/hive/pull/2620 shows that the option should be allowed to be set at the session level, as the HIVE-25501 was implemented in the CDP-7.1.7.
After some research we came across the Jira https://issues.apache.org/jira/browse/HIVE-27114, where the option hive.metastore.partitions.parameters.exclude.pattern was introduced. Setting it to 'impala_intermediate_stats_chunk%', does the trick, as the impalachunks* messages no longer appear, but:
* It only seems to work in the 7.3.1 Hive's version, so it doesn't apply to our 7.1.9 installation.
* We're worried about this note, https://issues.apache.org/jira/browse/IMPALA-12743, solved in Impala 4.4, where it seems that Impala would miss partition stats due to Hive's filtering after using the previous option.
We'd like to know:
* What are we doing wrong with the hive.plan.mapwork.serialization.skip.properties option in the CDP 7.1.9 installation ?
* Is it safe to use the hive.metastore.partitions.parameters.exclude.pattern option in CDP 7.3.1 ?
Thanks in advance for your help and time.
Regards.
Created 05-22-2025 03:00 AM
Validate the explain plan using explain extended <query> , if the explain plan contains "impala_intermediate_stats_chunk" . set the following session-level property to run the impacted query:
hive.plan.mapwork.serialization.skip.properties=impala_intermediate_stats_chunk.*
When setting these property, you may encounter an error:
Cannot modify hive.plan.mapwork.serialization.skip.properties at runtime. It is not in the list of parameters that are allowed to be modified at runtime (state=42000, code=1)
To avoid this issue, whitelist the parameter and restart HS2 before retrying the query setting.
hive.security.authorization.sqlstd.confwhitelist.append=hive\.plan\.mapwork\.serialization\.skip\.properties
Created 05-21-2025 10:46 PM
@AEAT, Welcome to our community! To help you get the best possible answer, I have tagged in our Hive experts @smruti, @asish @udeshmukh, who may be able to assist you further.
Please feel free to provide any additional information or details about your query. We hope that you will find a satisfactory solution to your question.
Regards,
Vidya Sargur,Created 05-22-2025 03:00 AM
Validate the explain plan using explain extended <query> , if the explain plan contains "impala_intermediate_stats_chunk" . set the following session-level property to run the impacted query:
hive.plan.mapwork.serialization.skip.properties=impala_intermediate_stats_chunk.*
When setting these property, you may encounter an error:
Cannot modify hive.plan.mapwork.serialization.skip.properties at runtime. It is not in the list of parameters that are allowed to be modified at runtime (state=42000, code=1)
To avoid this issue, whitelist the parameter and restart HS2 before retrying the query setting.
hive.security.authorization.sqlstd.confwhitelist.append=hive\.plan\.mapwork\.serialization\.skip\.properties
Created 05-22-2025 11:07 PM
Thanks for your quick response.
After adding the parameter to the whitelist and restarted the HS2, the property can be set at session-level without error.
But the impala_intermediate_stats_chunk messages are still there, surely I'm doing something wrong.
The following test was performed in our CDP-7.1.9 environment. HS2 version is 3.1.3000.7.1.9.1000-103
0: jdbc:hive2://docker-manager.desa.bigdata.t> set hive.plan.mapwork.serialization.skip.properties=impala_intermediate_stats_chunk.*;
No rows affected (0,012 seconds)
0: jdbc:hive2://docker-manager.desa.bigdata.t> set hive.plan.mapwork.serialization.skip.properties;
+----------------------------------------------------+
| set |
+----------------------------------------------------+
| hive.plan.mapwork.serialization.skip.properties=impala_intermediate_stats_chunk.* |
+----------------------------------------------------+
1 row selected (0,026 seconds)
0: jdbc:hive2://docker-manager.desa.bigdata.t> explain extended select count(*) from tb_logs_internet_orc where anio=2025 and mes=5 and dia=1 and fichero_carga like '%gz';
+----------------------------------------------------+
| Explain |
+----------------------------------------------------+
| OPTIMIZED SQL: SELECT COUNT(*) AS `$f0` |
| FROM `prometeo`.`tb_logs_internet_orc` |
| WHERE `fichero_carga` LIKE '%gz' AND `anio` = 2025 AND `mes` = 5 AND `dia` = 1 |
| STAGE DEPENDENCIES: |
| Stage-1 is a root stage |
| Stage-0 depends on stages: Stage-1 |
| |
| STAGE PLANS: |
| Stage: Stage-1 |
| Tez |
| DagId: hive_20250523074558_870356c3-0a31-473e-9f15-93f02b670066:3 |
| Edges: |
| Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE) |
| DagName: hive_20250523074558_870356c3-0a31-473e-9f15-93f02b670066:3 |
...
...
| STATS_GENERATED TASK |
| bucket_count 100 |
| bucket_field_name fecha_hit |
| file.inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat |
| file.outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat |
| impala.events.catalogServiceId eb75d6ab377d4a7f:816c40791e2fe086 |
| impala.events.catalogVersion 42082 |
| impala_intermediate_stats_chunk0 HBbi43gAGzGMC2Rlc2NyaXBjaW9uGAz/AP8AJgAAAv8A1wARFgAVABcAAAAAAAAAABbi43gWABYAHAAcAAAJb3JnYW5pc21vGAz/AP8A1AAAA/8AKQARFgAVFBcAAAAAAAAkQBbi43gWABYAHAAcAAANZmljaGVyb19jYXJnYRhaHQAAAVsAAAEkAAABFwAAAiQAAAISAAACEAAAAi0AAAKGAAACFQAABEgAAAElAAABFQAAAQAAAAEwAAABAwAAAR4AAARBAAADVgAAAkYAAAEFAAAFNQAAASgAERYAFWoXdbqaAOBkSUAW4uN4FgAWABwAHAAAA25pZhgUYAAAAf8ADQAAAf8A/wAzAAABWQARFgAVEhdAjdbyFcIgQBbi43gWABYAHAAcAAAQdGFtYW5pb19wZXRpY2lvbhiACAIBAgQBAAABAwQAAAEEAwIBAQQCAAEBAgABAQECAwEAAwMBAgEAAQQCAgUCBAAFAwABAgIEBgMBAwIGBAACAQEABAQFAQEFAgMCAQQCAgMCAQEBAgMDBQEAAwACAQUFAgICAQICAQICAgMEAAACAAUCAgABAAIBAgQBAgIBAAEDBQQFBAICAQAAAwMAAAQEAgMCAQAABQACAAMFAgUGAwEBAQEBAgEEBgUABAgAAwIBAgEBAQUAAAEEAgAAAgICAAYBAQEDAQIABAUEAwIABQIBAQYAAQIAAgIHCgMEAwIABAMDAgIAAAIAAwEAAQAFAQEEAQQEBAICAgEAAAUDAAMCAQQCAgIFBgECAAMBAgEBAwUDAAYCAwECCAEBAgQBAgEDAgMFBQECAQUBBAcEAAUBBAMFAAIBAQEEAgEBAQMGBAEAAwEDAAEAAwEBAAIBAQICCQAAAwMBAQMCAwADAQIAAQABAQIDAgECAgADAgIFAwMBAAIDAwUDAAAEAwACAgIJAQIEAQAEAgQGBQICAAYDAwIAAQEEAgICAgcEAAEDAQIAAgABBAABAQMBBAIAAAEBAQECAQEGAQMCBQICAgIDAgIBAgECAAACAwcCBQAEBQEBAgEEBwEBAAIAAgMBBgEBAAEFAgIFAQYCAgICAAMCAAICAAADAQAAAQEBAQIDAgABAAcGAQQEAgMCBAECAQIDAwABAQEBAQMFAwIGAQAECQEEAgUBAAAAAwEBAgMBBgEBAgICAQEAAgMDBQMECAEBBgMCBwADAwICBQUCBQMDAgUDBQIAAQICAQEBAQAAAgECAAMAAgACAgAHAQMDAQEAAwIDAgIBAgIAAwcAAwEAAAMBAQEBAQADBAEBAQIBAQEDAAEACQEAAwICAAABAwIEAAIBAAAAAQEHAAIBBAcAAAkBAAADAwQFAQQAAwICBAECAQACBgADAwYDAQMAAwEBAwQFAgIBAwUCAQECAQIFBQcABAEDAwkCAAAAAgECAAABAQMCAwUAAQMABAMEBgIEAgQBAwEEAgIDAgEBAgEEBAEGBAAABAYGBAMDAgMCAQQEAwAAAgMBAQUDAAsBAgEDAAICAAEAAAMAAgMCAgICAwAFAQILAAIHAwIABwABAAIGAQEAAAIBAgEAAgECAQMEAgECAAACBAUCAwQAAgIAAQMDCAUAAQQAAwMDAQECAQUDAwECAwEBAgQBAAMDBgECAgIEAQEDAwEFAQEDBAQBAAEDAwQCAwMDAgMAAwABAgMBBAEAAwEBAQECBwICAgEDAwIDAQMDAwMAAAMBAgMBAQEEAQACBgEABAECAQIAAQUDBgEDAQIGAgEAAgUBAQIAAgIDBQQBAAUABQQBAwISFgAVABcAAAAAAAAgQBbi43gWABYAHAAcAAAPeF9mNV9zb3VyY2Vwb3J0GIAIBQUDBgUFAQYFBAgGBgYGBQUGBAMHBwUEBgQGAwMGCgQDBQQFBQMEBg4FAwUDBwUEAwUHBQYFBAIFCAwFBAcIBwsEBQMEBAYEAgUEBAMFAgQHBAQFBAQHBwkFAwYGDgYFBAUDBAkFAwQHCQUFCAYDBQYFBgMFAwoKBAYEBAUDBAUFAwkDAw0EBgYEBgQCAwQIBgQEBAUDAwUDBQQGBgcGBQMDBwQFBgUJBwgDAwUDCAQHBgYGBAQDBgMHBAUEBAUDBQUDBgQJBQQEBAgFBAUFAwUDBwQFBQIEBAYHBQQFBgYHBwMDCwQGAwwFBAcHBAcDCAkDAwQEAwkEBAUKAwUIBQYFBQUGBwQFCAMFBgQDAwQEBwIGAwYDAwcHBQcDAwUHBgcHBAUEBwcGBwYGAwYDBQQEBAYHBAUFBAQGBAQIBAkEAgYGAwMFBAUKBAMGBAcEBwMDBAMEBgQFBAMFBAUHBAkEBQQHAwQEBAMEBwQDBQUEBggEAwIEBwIEBAQGBAQFCAcFBQsHBAIEBAYHBgUFBAQIBQgGBQIFBggDBwcFBQUDBAgEAwUFBwQHBAQDBAQFBQQMBgcGAwUECgQGAwcFBAQGBAcECAoGAwUCBAQHBQQDBAUFBAQFAgUEBQUGBwMHBQQHBAQDCAYGBwQEBwMEBAYHBQUGBgcDDAcEAgcFBwYFAwUFBQYEAgMFBQwIBAYDBgMDAwgICAYFBAcEAgMEAwUEBgQGBQYEAwUHAgUFBQUDBQcEAwQEBAcDBAcHBAQFBAMFAwMJBgYEBAcDBgYEBQIEBQYHBQYIBAQDAwUFBAUICQcEBwUKDAQCBwUGBgQFBAYEAwYDCAMHCAQDAggEAwMFBQcFBgMEBgUDBQQDBAYEAwMDBwcFBgQEBQQICAMGBAQGBwUFBAMHBggGAgQIBAQFAwMEBgQEBAQEBQUEBgQEBQgGBAgEBgMFBAUEBgkFAgQFAwULBAcCBAUFAwQDAwUFCAQEBwUGCQYGBQcFBwYEBAUGBQYDBQMICQUFCQMGBgUEBwMGAgcFBQkGBgQECAcEBgUFBgMEBgUDBQMFBAMIBQUFBAUEBggGAwQEBAYHBAUGBgQFCQIDBQUEAwYFAwEFBQYDBAUDBgYECAgDCAMFAwMGBAYGAwQEAgYHBgUGBAUCBAUFCgUFBAUFBgQDBAQHAgYEBAcFAwMGBAUECgUFBAUEBAQEBAMECAQFBAUGBAYEBwUEBQQECAcGAwMECAYFAwQDBQMCCAkHCwUFBQMGCgUJBAMEBQYFBAQFBAoDBgcEBgYGAwQFBAUDCAYDBgUECgYFBQMGBQUCAwYFBAQGBQQHCQYDBQMHAwQEBQUGBgUFAwcHBwYDAwgGBAkEBxIWABUAFwAAAAAAABBAFuLjeBYAFgAcABwAABJqc2Vzc2lvbmlkX2hhcHJveHkYDmAAAAHlAAAB/wD/ALYAERYAFQgXtWqJHNurA0AW4uN4FgAWABwAHAAACGhvcmFfaGl0GIAICAcHBQgGCAQEBwQIBwUHBQYHBgUFCAgHBQYHBggHBQcHBwUIBwQEBQUIBQQEBQgLBwcGBQkJBwUECQYHBgcIBgMHBQcFBgsHCAUGBAcFCQYEBQUGBAQGCQUGBgYHCQcGBQkLCgUFCgUHBAUFBgUGCAcHCAUGBwcGBwcGBgYFBwcGCQYGBgYFCAUHBgYJCgUKCwYFCAkLBgQHBAYGCQYGBwYGBAoHCAkIBAUGBQQFCAQFBQUFBQgGCQYIBAcGBwYGBwUICAUHBwcFBAYFCwYFBwUEBAQICAYHBggLBQgHBAUGBQsKBgQFBAQIBgUHCQcJBAUEBgkGBwQGBgYNBQsJCAgFBgcGBQcFBQcJCQYHBQcGEwgFBgUFBQkHBwYKBAkGBgYICQUHBQYFBAgEBggKBgYHBQQHBwQFBwUJBQoJCQQKBQoIBgYFBQYKBggICQUHBQoHCAgFBQQEBwUJBQcGBwgFBwkHBwcEBgkGCQYJBwUGBQgFBgcJBwkFCAgFCAgJBQcJBgQICQgIBgoGBgUGBgYGBgkHBgUFCgcGCAwIBggFBwUIBQYFBgUHBgoKBgUGBwYHBgUFBwYQBQUKAwYFBwYHBgYHBgcGBQYGBQUGCQgLBwYGBQYGCAYHBgcHBAYHBQgGCAQG |
...
...
Thanks again for your time and help.
Regards.
Created 05-26-2025 05:06 AM
Impala stats are stored in Backend DB metadata which is getting propagated from HS2 to Tez dag plan. That's the reason.
Above config helps to skip those impala stats.
hive.plan.mapwork.serialization.skip.properties=impala_intermediate_stats_chunk.*
Created 05-26-2025 06:19 AM
Thanks for your answer.
Yes, I can see the info in the backend's table PARTITION_PARAMS divided in chunks.
If I understood it correctly, with the option set, the TEZ AM job won't be affected by that impala's metadata , even if I can still see those 'impala_intermediate_stats_chunk' messages in the plan 's output.
Thanks again for your time.
Regards.
Created 05-26-2025 06:45 AM
Yes, your understanding is correct
Created 05-26-2025 07:02 AM
Perfect, thanks four your help and quick responses.