About iamsaaj

iamsaaj · ‎08-06-2019

Thanks for the reply ! Views are already created by joining many underlying table, hence joining the views again for data aggregation will result performance issue. Here are the two approach i came up with 1. Extract data from Hive view into files. 2. Create intermediate Hive tables and load data extracted from views. 3. Join the new hive tables to generate the final file. Another approach to use PySpark to read data from views directly , aggreate and transform the data and generate the final output file.

iamsaaj · ‎08-06-2019

Thanks for the reply ! Views are already created by joining many underlying table, hence joining the views again for data aggregation will result performance issue. Here are the two approach i came up with 1. Extract data from Hive view into files. 2. Create intermediate Hive tables and load data extracted from views. 3. Join the new hive tables to generate the final file. Another approach to use PySpark to read data from views directly , aggreate and transform the data and generate the final output file.

iamsaaj · ‎07-27-2019

I have to read data from two different hive views(probably two different databases) , extract data from those views and write it into files and then, join those files and perform data formatting and finally , write it into final file. Could you please suggest me some tools/technologies and design for hadoop for this requirement.

Online	Offline
Last Visited	‎08-06-2019 09:26 AM

Member Since	‎06-27-2019 05:55 AM
Last Visited	‎08-06-2019 09:26 AM
Posts	4

Cloudera Community

Re: Hadoop tools/technology and design recommendat...

Re: Hadoop tools/technology and design recommendat...

Hadoop tools/technology and design recommendation