Support Questions
Find answers, ask questions, and share your expertise

How to handle data loading in hive

How to handle data loading in hive


Hi All,


Daily we will follow the below steps for full load

Step 1 : Reading data from parquet files into pyspark and do transformations and save data as parquet files with partitions and repartitions

Step2 : Dataframe will be saved as new external table in hive(with some temporary name and it has complex datatypes)

Step3 : Create or alter view in hive

Step4 : The view will be used by tableau with impala and by  pyspark for some other jobs.


Problem is impala cannot read complex datatypes,so view will throw error.If we create view in impala then it ignores complex datatypes and creates view so we cannot read complex datatypes from spark.


We have solutions like creating 2 views in hive and impala on the same table  or rename newly created external table with original table name.Currently we are renaming the table.Can someone suggest if there is any better approach for this.


Thanks in advance