Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What happens when hdfs files are changed when data is in LLAP cache ?

What happens when hdfs files are changed when data is in LLAP cache ?

Rising Star

Lets assume the data is in LLAP cache. What are problems associated with changing the hdfs file when the data is in LLAP cache ?

1 REPLY 1
Highlighted

Re: What happens when hdfs files are changed when data is in LLAP cache ?

Contributor

LLAP caches file identification when storing data. For HDFS, that is by default the file inode ID; for other FS (e.g. s3), it's the combination of name, size, and modification timestamp. HDFS inode ID (and other FSs time and size) change on appends, so the cache data gets invalidated and cached with a different ID. The old data is currently not proactively removed, but it is no longer used by queries, and will eventually be evicted.

However, for ORC/Parquet/etc. files generally the append pattern is not used - the file is sealed once it is written.

For example, Hive ACID would write new files (deltas) and then compact the old base file and the deltas into a new base file. In this case, the new files' data is cached with a new ID.

Don't have an account?
Coming from Hortonworks? Activate your account here