Support Questions

Muskan · ‎06-26-2025

Problem statement:
When i try to query iceberg table for current date which recieves data from streaming pipeline in interval of 5 mins

Example: select * from <table> where result_date="<current_date>" limit 1;

Error:
ImpalaRuntimeException: Cannot find file in cache:: Cannot find file in cache: hdfs://xx/ya/Zzz/data/resulted/00004-22575-da5239e5-71d0-4b2f-af6b-73cbf4b7d9c5-46884-00001.parquet with snapshot id: 2154647205402518684

Workaround tried:

Invalidate metadata or refresh - works for few mins until next commit occurs and then throws same error with new file and new snapshot id
Tried setting below as tblproperties but no help
ALTER TABLE db.table_name SET TBLPROPERTIES (
'metadata_refresh_interval_ms' = '60000',
'refresh-before-read' = 'true'
);
Even tried to understand whether below properties have any impact but seems like no
write.metadata.delete-after-commit.enabled
write.metadata.previous-versions-max
unable to understand why this issue is poping where as iceberg maintains isolation. Where as same table can be queried via spark3-shell
Also with same table properties some tables which gets data from same pipeline with same interval i am able to query successfully but not for few tables

Any solution would be of great help.

Muskan · ‎06-26-2025

Hello @VidyaSargur
Can you please support me here to understand and fix this issue

It's critical as end users are not able to query the tables

VidyaSargur · ‎06-29-2025

@Muskan, Thank you for reaching out to me. I am not a technical expert, but I have tagged our experts who can assist you. @willx and @ChethanYM, could you please help here?

Regards,

Vidya Sargur,
Community Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

Support Questions

Impala RuntimeException : file not found in cache