Support Questions

Find answers, ask questions, and share your expertise

Impala RuntimeException : file not found in cache

avatar
Explorer

Problem statement: 
When i try to query iceberg table for current date which recieves data from streaming pipeline in interval of 5 mins 

Example: select * from <table> where result_date="<current_date>" limit 1;



Error:
ImpalaRuntimeException: Cannot find file in cache:: Cannot find file in cache: hdfs://xx/ya/Zzz/data/resulted/00004-22575-da5239e5-71d0-4b2f-af6b-73cbf4b7d9c5-46884-00001.parquet with snapshot id: 2154647205402518684



Workaround tried: 

  • Invalidate metadata or refresh - works for few mins until next commit occurs and then throws same error with new file and new snapshot id 
  • Tried setting below as tblproperties but no help
    ALTER TABLE db.table_name SET TBLPROPERTIES (
    'metadata_refresh_interval_ms' = '60000',
    'refresh-before-read' = 'true'
    );

  • Even tried to understand whether below properties have any impact but seems like no 
    write.metadata.delete-after-commit.enabled
    write.metadata.previous-versions-max
  • unable to understand why this issue is poping where as iceberg maintains isolation. Where as same table can be queried via spark3-shell
  • Also with same table properties some tables which gets data from same pipeline with same interval i am able to query successfully but not for few tables 

Any solution would be of great help.

2 REPLIES 2

avatar
Explorer

Hello @VidyaSargur 
Can you please support me here to understand and fix this issue

It's critical as end users are not able to query the tables

avatar
Community Manager

@Muskan, Thank you for reaching out to me. I am not a technical expert, but I have tagged our experts who can assist you. @willx and @ChethanYM, could you please help here?



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: