Lets assume the data is in LLAP cache. What are problems associated with changing the hdfs file when the data is in LLAP cache ?
LLAP caches file identification when storing data. For HDFS, that is by default the file inode ID; for other FS (e.g. s3), it's the combination of name, size, and modification timestamp. HDFS inode ID (and other FSs time and size) change on appends, so the cache data gets invalidated and cached with a different ID. The old data is currently not proactively removed, but it is no longer used by queries, and will eventually be evicted.
However, for ORC/Parquet/etc. files generally the append pattern is not used - the file is sealed once it is written.
For example, Hive ACID would write new files (deltas) and then compact the old base file and the deltas into a new base file. In this case, the new files' data is cached with a new ID.