Hi All,
Daily we will follow the below steps for full load
Step 1 : Reading data from parquet files into pyspark and do transformations and save data as parquet files with partitions and repartitions
Step2 : Dataframe will be saved as new external table in hive(with some temporary name and it has complex datatypes)
Step3 : Create or alter view in hive
Step4 : The view will be used by tableau with impala and by pyspark for some other jobs.
Problem is impala cannot read complex datatypes,so view will throw error.If we create view in impala then it ignores complex datatypes and creates view so we cannot read complex datatypes from spark.
We have solutions like creating 2 views in hive and impala on the same table or rename newly created external table with original table name.Currently we are renaming the table.Can someone suggest if there is any better approach for this.
Thanks in advance