We have a use case to reload all transactions data every month for defined set of years. We are going to use Spark and create required reporting tables. Will use Impala for analytical workloads with BI tool.
how do we separate the data processing tables vs reporting tables and then swap tables in Impala? We want to minimise the impact to users in terms of availability of BI system and to ensure read consistency.
Ive come across couple of options - partitioning (but need to copy or move the files to the reporting directory location) so that for the next run we can make use of data processing tables
and other option is lock table - remove files, and move files from working to reporting directory (again will cause an impact to users during that file removal and movement duration).
ideally it would work if we can have two databases and swap them based on the data load completion.