We have a use case where we run an ETL written in spark on top of some streaming data, the ETL writes results to the target hive table every hour, but users are commonly running queries to the target table and we have faced cases of having query errors due to spark loading the table at the same time:
java.io.FileNotFoundException: File does not exist: <HDFS path>
What alternatives do we have to avoid or minimize this errors? Any property to the spark job(or to the hive table)? or something like creating a temporary table?