Support Questions

AEAT · ‎08-28-2025

Hi All

Under CDP 7.3.1, our Impala's log is flooding with messages like

ORC read request to already read range. Falling back to readRandom. offset: 16248424 length: 105200 colrange_offset: 15988393 colrange_length: 365231 colrange_pos: 16353624 typeId: 94 kind: data filename: .......

from queries reading ACID ORC tables created from Hive.

We've been searching about that and the only thing we've found is the code that triggers that message.

void HdfsOrcScanner::ScanRangeInputStream::read(void* buf, uint64_t length,
    uint64_t offset) {
  Status status;
  if (scanner_->IsInFooterRange(offset, length)) {
    status = scanner_->ReadFooterStream(buf, length, offset);
  } else {
    ColumnRange* columnRange = scanner_->FindColumnRange(length, offset);
    if (columnRange == nullptr) {
      status = readRandom(buf, length, offset);
    } else if (offset < columnRange->current_position_) {
      VLOG_QUERY << Substitute(
          "ORC read request to already read range. Falling back to readRandom. "
          "offset: $0 length: $1 $2",
          offset, length, columnRange->debug());
      status = readRandom(buf, length, offset);
    } else {
      status = columnRange->read(buf, length, offset);
    }
  }
  if (!status.ok()) throw ResourceError(status);
}

I'd like to know the performance implications of those messages, if any (the messages are categorized as info), and in affirmative case, what we can do to solve this.

Thanks in advance for your help and time.

Regards.

ChethanYM · ‎08-29-2025

@AEAT The log message "ORC read request to already read range. Falling back to readRandom" is a sign of a suboptimal read pattern. While not a fatal error, it means Impala is not reading the ORC file as efficiently as it could.

Impala's ORC scanner is designed to read data in a sequential, read-ahead fashion to optimize I/O from HDFS. It attempts to predict what data a query will need next and reads it in large, efficient chunks.

-> Random reads are slower than sequential reads on both spinning disks and SSDs.

-> The process of seeking to a different location in the file and reading a small chunk of data consumes more CPU resources.

-> The cumulative effect of these inefficient reads can add significant time to a query's execution, especially for large datasets.

The most common cause of this issue is a large number of small files. Impala has to make many I/O requests to process each file, which can disrupt the efficient read pattern. Please check if you have such pattren of files and compress it as per the hdfs block size.

Manually monitor the resources usage while running the query.

Regards,

Chethan YM

View solution in original post

asish · ‎10-09-2025

@AEAT COuld you please check orcdump of the file?

hive --orcfiledump -d <path to the file>

View solution in original post

ChethanYM · ‎08-29-2025

@AEAT The log message "ORC read request to already read range. Falling back to readRandom" is a sign of a suboptimal read pattern. While not a fatal error, it means Impala is not reading the ORC file as efficiently as it could.

Impala's ORC scanner is designed to read data in a sequential, read-ahead fashion to optimize I/O from HDFS. It attempts to predict what data a query will need next and reads it in large, efficient chunks.

-> Random reads are slower than sequential reads on both spinning disks and SSDs.

-> The process of seeking to a different location in the file and reading a small chunk of data consumes more CPU resources.

-> The cumulative effect of these inefficient reads can add significant time to a query's execution, especially for large datasets.

The most common cause of this issue is a large number of small files. Impala has to make many I/O requests to process each file, which can disrupt the efficient read pattern. Please check if you have such pattren of files and compress it as per the hdfs block size.

Manually monitor the resources usage while running the query.

Regards,

Chethan YM

VidyaSargur · ‎09-07-2025

@AEAT, Did the response assist in resolving your query? If it did, please mark the relevant reply as the solution, as it will help others locate the answer more easily in the future.

Regards,

Vidya Sargur,
Community Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

asish · ‎10-09-2025

@AEAT COuld you please check orcdump of the file?

hive --orcfiledump -d <path to the file>

Support Questions

Impala's log "ORC read request to already read range" messages