Member since
08-24-2023
6
Posts
0
Kudos Received
0
Solutions
08-28-2025
12:03 AM
Hi All Under CDP 7.3.1, our Impala's log is flooding with messages like ORC read request to already read range. Falling back to readRandom. offset: 16248424 length: 105200 colrange_offset: 15988393 colrange_length: 365231 colrange_pos: 16353624 typeId: 94 kind: data filename: ....... from queries reading ACID ORC tables created from Hive. We've been searching about that and the only thing we've found is the code that triggers that message. void HdfsOrcScanner::ScanRangeInputStream::read(void* buf, uint64_t length,
uint64_t offset) {
Status status;
if (scanner_->IsInFooterRange(offset, length)) {
status = scanner_->ReadFooterStream(buf, length, offset);
} else {
ColumnRange* columnRange = scanner_->FindColumnRange(length, offset);
if (columnRange == nullptr) {
status = readRandom(buf, length, offset);
} else if (offset < columnRange->current_position_) {
VLOG_QUERY << Substitute(
"ORC read request to already read range. Falling back to readRandom. "
"offset: $0 length: $1 $2",
offset, length, columnRange->debug());
status = readRandom(buf, length, offset);
} else {
status = columnRange->read(buf, length, offset);
}
}
if (!status.ok()) throw ResourceError(status);
} I'd like to know the performance implications of those messages, if any (the messages are categorized as info), and in affirmative case, what we can do to solve this. Thanks in advance for your help and time. Regards.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
07-17-2025
02:18 AM
Hello We're testing the iceberg tables functionality on our CDP 7.3.1 environment, and after reviewing the documentation and performing some tests, it seems that the only way to compact the tables efficiently after modifications is via Spark. Impala's 'optimize table' command rewrites the whole table (the file size threshold option is not implemented till version 4.5), and Hive seems not able to compact them in this version 4 included in our CDP. The Spark procedures rewrite_position_delete_files and rewrite_data_files work correctly, but we wonder if we've misunderstood the documentation or maybe are missing any alternative. Thanks in advance for your help. Regards, Miguel
... View more
05-26-2025
07:02 AM
Perfect, thanks four your help and quick responses.
... View more
05-26-2025
06:19 AM
Thanks for your answer. Yes, I can see the info in the backend's table PARTITION_PARAMS divided in chunks. If I understood it correctly, with the option set, the TEZ AM job won't be affected by that impala's metadata , even if I can still see those 'impala_intermediate_stats_chunk' messages in the plan 's output. Thanks again for your time. Regards.
... View more
05-22-2025
11:07 PM
Thanks for your quick response. After adding the parameter to the whitelist and restarted the HS2, the property can be set at session-level without error. But the impala_intermediate_stats_chunk messages are still there, surely I'm doing something wrong. The following test was performed in our CDP-7.1.9 environment. HS2 version is 3.1.3000.7.1.9.1000-103 0: jdbc:hive2://docker-manager.desa.bigdata.t> set hive.plan.mapwork.serialization.skip.properties=impala_intermediate_stats_chunk.*; No rows affected (0,012 seconds) 0: jdbc:hive2://docker-manager.desa.bigdata.t> set hive.plan.mapwork.serialization.skip.properties; +----------------------------------------------------+ | set | +----------------------------------------------------+ | hive.plan.mapwork.serialization.skip.properties=impala_intermediate_stats_chunk.* | +----------------------------------------------------+ 1 row selected (0,026 seconds) 0: jdbc:hive2://docker-manager.desa.bigdata.t> explain extended select count(*) from tb_logs_internet_orc where anio=2025 and mes=5 and dia=1 and fichero_carga like '%gz'; +----------------------------------------------------+ | Explain | +----------------------------------------------------+ | OPTIMIZED SQL: SELECT COUNT(*) AS `$f0` | | FROM `prometeo`.`tb_logs_internet_orc` | | WHERE `fichero_carga` LIKE '%gz' AND `anio` = 2025 AND `mes` = 5 AND `dia` = 1 | | STAGE DEPENDENCIES: | | Stage-1 is a root stage | | Stage-0 depends on stages: Stage-1 | | | | STAGE PLANS: | | Stage: Stage-1 | | Tez | | DagId: hive_20250523074558_870356c3-0a31-473e-9f15-93f02b670066:3 | | Edges: | | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE) | | DagName: hive_20250523074558_870356c3-0a31-473e-9f15-93f02b670066:3 | ... ... | STATS_GENERATED TASK | | bucket_count 100 | | bucket_field_name fecha_hit | | file.inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat | | file.outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat | | impala.events.catalogServiceId eb75d6ab377d4a7f:816c40791e2fe086 | | impala.events.catalogVersion 42082 | | impala_intermediate_stats_chunk0 HBbi43gAGzGMC2Rlc2NyaXBjaW9uGAz/AP8AJgAAAv8A1wARFgAVABcAAAAAAAAAABbi43gWABYAHAAcAAAJb3JnYW5pc21vGAz/AP8A1AAAA/8AKQARFgAVFBcAAAAAAAAkQBbi43gWABYAHAAcAAANZmljaGVyb19jYXJnYRhaHQAAAVsAAAEkAAABFwAAAiQAAAISAAACEAAAAi0AAAKGAAACFQAABEgAAAElAAABFQAAAQAAAAEwAAABAwAAAR4AAARBAAADVgAAAkYAAAEFAAAFNQAAASgAERYAFWoXdbqaAOBkSUAW4uN4FgAWABwAHAAAA25pZhgUYAAAAf8ADQAAAf8A/wAzAAABWQARFgAVEhdAjdbyFcIgQBbi43gWABYAHAAcAAAQdGFtYW5pb19wZXRpY2lvbhiACAIBAgQBAAABAwQAAAEEAwIBAQQCAAEBAgABAQECAwEAAwMBAgEAAQQCAgUCBAAFAwABAgIEBgMBAwIGBAACAQEABAQFAQEFAgMCAQQCAgMCAQEBAgMDBQEAAwACAQUFAgICAQICAQICAgMEAAACAAUCAgABAAIBAgQBAgIBAAEDBQQFBAICAQAAAwMAAAQEAgMCAQAABQACAAMFAgUGAwEBAQEBAgEEBgUABAgAAwIBAgEBAQUAAAEEAgAAAgICAAYBAQEDAQIABAUEAwIABQIBAQYAAQIAAgIHCgMEAwIABAMDAgIAAAIAAwEAAQAFAQEEAQQEBAICAgEAAAUDAAMCAQQCAgIFBgECAAMBAgEBAwUDAAYCAwECCAEBAgQBAgEDAgMFBQECAQUBBAcEAAUBBAMFAAIBAQEEAgEBAQMGBAEAAwEDAAEAAwEBAAIBAQICCQAAAwMBAQMCAwADAQIAAQABAQIDAgECAgADAgIFAwMBAAIDAwUDAAAEAwACAgIJAQIEAQAEAgQGBQICAAYDAwIAAQEEAgICAgcEAAEDAQIAAgABBAABAQMBBAIAAAEBAQECAQEGAQMCBQICAgIDAgIBAgECAAACAwcCBQAEBQEBAgEEBwEBAAIAAgMBBgEBAAEFAgIFAQYCAgICAAMCAAICAAADAQAAAQEBAQIDAgABAAcGAQQEAgMCBAECAQIDAwABAQEBAQMFAwIGAQAECQEEAgUBAAAAAwEBAgMBBgEBAgICAQEAAgMDBQMECAEBBgMCBwADAwICBQUCBQMDAgUDBQIAAQICAQEBAQAAAgECAAMAAgACAgAHAQMDAQEAAwIDAgIBAgIAAwcAAwEAAAMBAQEBAQADBAEBAQIBAQEDAAEACQEAAwICAAABAwIEAAIBAAAAAQEHAAIBBAcAAAkBAAADAwQFAQQAAwICBAECAQACBgADAwYDAQMAAwEBAwQFAgIBAwUCAQECAQIFBQcABAEDAwkCAAAAAgECAAABAQMCAwUAAQMABAMEBgIEAgQBAwEEAgIDAgEBAgEEBAEGBAAABAYGBAMDAgMCAQQEAwAAAgMBAQUDAAsBAgEDAAICAAEAAAMAAgMCAgICAwAFAQILAAIHAwIABwABAAIGAQEAAAIBAgEAAgECAQMEAgECAAACBAUCAwQAAgIAAQMDCAUAAQQAAwMDAQECAQUDAwECAwEBAgQBAAMDBgECAgIEAQEDAwEFAQEDBAQBAAEDAwQCAwMDAgMAAwABAgMBBAEAAwEBAQECBwICAgEDAwIDAQMDAwMAAAMBAgMBAQEEAQACBgEABAECAQIAAQUDBgEDAQIGAgEAAgUBAQIAAgIDBQQBAAUABQQBAwISFgAVABcAAAAAAAAgQBbi43gWABYAHAAcAAAPeF9mNV9zb3VyY2Vwb3J0GIAIBQUDBgUFAQYFBAgGBgYGBQUGBAMHBwUEBgQGAwMGCgQDBQQFBQMEBg4FAwUDBwUEAwUHBQYFBAIFCAwFBAcIBwsEBQMEBAYEAgUEBAMFAgQHBAQFBAQHBwkFAwYGDgYFBAUDBAkFAwQHCQUFCAYDBQYFBgMFAwoKBAYEBAUDBAUFAwkDAw0EBgYEBgQCAwQIBgQEBAUDAwUDBQQGBgcGBQMDBwQFBgUJBwgDAwUDCAQHBgYGBAQDBgMHBAUEBAUDBQUDBgQJBQQEBAgFBAUFAwUDBwQFBQIEBAYHBQQFBgYHBwMDCwQGAwwFBAcHBAcDCAkDAwQEAwkEBAUKAwUIBQYFBQUGBwQFCAMFBgQDAwQEBwIGAwYDAwcHBQcDAwUHBgcHBAUEBwcGBwYGAwYDBQQEBAYHBAUFBAQGBAQIBAkEAgYGAwMFBAUKBAMGBAcEBwMDBAMEBgQFBAMFBAUHBAkEBQQHAwQEBAMEBwQDBQUEBggEAwIEBwIEBAQGBAQFCAcFBQsHBAIEBAYHBgUFBAQIBQgGBQIFBggDBwcFBQUDBAgEAwUFBwQHBAQDBAQFBQQMBgcGAwUECgQGAwcFBAQGBAcECAoGAwUCBAQHBQQDBAUFBAQFAgUEBQUGBwMHBQQHBAQDCAYGBwQEBwMEBAYHBQUGBgcDDAcEAgcFBwYFAwUFBQYEAgMFBQwIBAYDBgMDAwgICAYFBAcEAgMEAwUEBgQGBQYEAwUHAgUFBQUDBQcEAwQEBAcDBAcHBAQFBAMFAwMJBgYEBAcDBgYEBQIEBQYHBQYIBAQDAwUFBAUICQcEBwUKDAQCBwUGBgQFBAYEAwYDCAMHCAQDAggEAwMFBQcFBgMEBgUDBQQDBAYEAwMDBwcFBgQEBQQICAMGBAQGBwUFBAMHBggGAgQIBAQFAwMEBgQEBAQEBQUEBgQEBQgGBAgEBgMFBAUEBgkFAgQFAwULBAcCBAUFAwQDAwUFCAQEBwUGCQYGBQcFBwYEBAUGBQYDBQMICQUFCQMGBgUEBwMGAgcFBQkGBgQECAcEBgUFBgMEBgUDBQMFBAMIBQUFBAUEBggGAwQEBAYHBAUGBgQFCQIDBQUEAwYFAwEFBQYDBAUDBgYECAgDCAMFAwMGBAYGAwQEAgYHBgUGBAUCBAUFCgUFBAUFBgQDBAQHAgYEBAcFAwMGBAUECgUFBAUEBAQEBAMECAQFBAUGBAYEBwUEBQQECAcGAwMECAYFAwQDBQMCCAkHCwUFBQMGCgUJBAMEBQYFBAQFBAoDBgcEBgYGAwQFBAUDCAYDBgUECgYFBQMGBQUCAwYFBAQGBQQHCQYDBQMHAwQEBQUGBgUFAwcHBwYDAwgGBAkEBxIWABUAFwAAAAAAABBAFuLjeBYAFgAcABwAABJqc2Vzc2lvbmlkX2hhcHJveHkYDmAAAAHlAAAB/wD/ALYAERYAFQgXtWqJHNurA0AW4uN4FgAWABwAHAAACGhvcmFfaGl0GIAICAcHBQgGCAQEBwQIBwUHBQYHBgUFCAgHBQYHBggHBQcHBwUIBwQEBQUIBQQEBQgLBwcGBQkJBwUECQYHBgcIBgMHBQcFBgsHCAUGBAcFCQYEBQUGBAQGCQUGBgYHCQcGBQkLCgUFCgUHBAUFBgUGCAcHCAUGBwcGBwcGBgYFBwcGCQYGBgYFCAUHBgYJCgUKCwYFCAkLBgQHBAYGCQYGBwYGBAoHCAkIBAUGBQQFCAQFBQUFBQgGCQYIBAcGBwYGBwUICAUHBwcFBAYFCwYFBwUEBAQICAYHBggLBQgHBAUGBQsKBgQFBAQIBgUHCQcJBAUEBgkGBwQGBgYNBQsJCAgFBgcGBQcFBQcJCQYHBQcGEwgFBgUFBQkHBwYKBAkGBgYICQUHBQYFBAgEBggKBgYHBQQHBwQFBwUJBQoJCQQKBQoIBgYFBQYKBggICQUHBQoHCAgFBQQEBwUJBQcGBwgFBwkHBwcEBgkGCQYJBwUGBQgFBgcJBwkFCAgFCAgJBQcJBgQICQgIBgoGBgUGBgYGBgkHBgUFCgcGCAwIBggFBwUIBQYFBgUHBgoKBgUGBwYHBgUFBwYQBQUKAwYFBwYHBgYHBgcGBQYGBQUGCQgLBwYGBQYGCAYHBgcHBAYHBQgGCAQG | ... ... Thanks again for your time and help. Regards.
... View more
05-21-2025
07:00 AM
Hello. We have ORC tables loaded from Hive (Tez) and queried from Impala. Due to their number of partitions, we had to increase the TEZ AM task's heap, and the note https://my.cloudera.com/knowledge/Tez-AMtask-fails-with-OOME-because-of-huge-impala-stats-chunk?id=348603 seems to match our issue We'd like to reduce the AM memory footprint, but the solution doesn't work in our 7.1.9 and 7.3.1 environments: * After setting the option in the hive-site.xml, the "impala_intermediate_stats_chunk*" messages are still present in the Hive's plans. * Setting the option at the session level returns the error "Cannot modify hive.plan.mapwork.serialization.skip.properties at runtime. It is not in the list of params that are allowed to be modified at runtime (state=42000,code=1)". * https://github.com/apache/hive/pull/2620 shows that the option should be allowed to be set at the session level, as the HIVE-25501 was implemented in the CDP-7.1.7. After some research we came across the Jira https://issues.apache.org/jira/browse/HIVE-27114, where the option hive.metastore.partitions.parameters.exclude.pattern was introduced. Setting it to 'impala_intermediate_stats_chunk%', does the trick, as the impalachunks* messages no longer appear, but: * It only seems to work in the 7.3.1 Hive's version, so it doesn't apply to our 7.1.9 installation. * We're worried about this note, https://issues.apache.org/jira/browse/IMPALA-12743, solved in Impala 4.4, where it seems that Impala would miss partition stats due to Hive's filtering after using the previous option. We'd like to know: * What are we doing wrong with the hive.plan.mapwork.serialization.skip.properties option in the CDP 7.1.9 installation ? * Is it safe to use the hive.metastore.partitions.parameters.exclude.pattern option in CDP 7.3.1 ? Thanks in advance for your help and time. Regards.
... View more
Labels: