Hi All
Under CDP 7.3.1, our Impala's log is flooding with messages like
ORC read request to already read range. Falling back to readRandom. offset: 16248424 length: 105200 colrange_offset: 15988393 colrange_length: 365231 colrange_pos: 16353624 typeId: 94 kind: data filename: .......
from queries reading ACID ORC tables created from Hive.
We've been searching about that and the only thing we've found is the code that triggers that message.
void HdfsOrcScanner::ScanRangeInputStream::read(void* buf, uint64_t length,
uint64_t offset) {
Status status;
if (scanner_->IsInFooterRange(offset, length)) {
status = scanner_->ReadFooterStream(buf, length, offset);
} else {
ColumnRange* columnRange = scanner_->FindColumnRange(length, offset);
if (columnRange == nullptr) {
status = readRandom(buf, length, offset);
} else if (offset < columnRange->current_position_) {
VLOG_QUERY << Substitute(
"ORC read request to already read range. Falling back to readRandom. "
"offset: $0 length: $1 $2",
offset, length, columnRange->debug());
status = readRandom(buf, length, offset);
} else {
status = columnRange->read(buf, length, offset);
}
}
if (!status.ok()) throw ResourceError(status);
}
I'd like to know the performance implications of those messages, if any (the messages are categorized as info), and in affirmative case, what we can do to solve this.
Thanks in advance for your help and time.
Regards.