About iamsaaj

iamsaaj · ‎08-06-2019

Thanks for the reply ! Views are already created by joining many underlying table, hence joining the views again for data aggregation will result performance issue. Here are the two approach i came up with 1. Extract data from Hive view into files. 2. Create intermediate Hive tables and load data extracted from views. 3. Join the new hive tables to generate the final file. Another approach to use PySpark to read data from views directly , aggreate and transform the data and generate the final output file.

Online	Offline
Last Visited	‎08-06-2019 09:26 AM

Member Since	‎06-27-2019 05:55 AM
Last Visited	‎08-06-2019 09:26 AM
Posts	4

Cloudera Community

Re: Hadoop tools/technology and design recommendat...