Member since
06-27-2019
4
Posts
0
Kudos Received
0
Solutions
08-06-2019
06:05 AM
Thanks for the reply ! Views are already created by joining many underlying table, hence joining the views again for data aggregation will result performance issue. Here are the two approach i came up with 1. Extract data from Hive view into files. 2. Create intermediate Hive tables and load data extracted from views. 3. Join the new hive tables to generate the final file. Another approach to use PySpark to read data from views directly , aggreate and transform the data and generate the final output file.
... View more
08-06-2019
06:04 AM
Thanks for the reply ! Views are already created by joining many underlying table, hence joining the views again for data aggregation will result performance issue. Here are the two approach i came up with 1. Extract data from Hive view into files. 2. Create intermediate Hive tables and load data extracted from views. 3. Join the new hive tables to generate the final file. Another approach to use PySpark to read data from views directly , aggreate and transform the data and generate the final output file.
... View more
07-27-2019
07:42 PM
I have to read data from two different hive views(probably two different databases) , extract data from those views and write it into files and then, join those files and perform data formatting and finally , write it into final file.
Could you please suggest me some tools/technologies and design for hadoop for this requirement.
... View more
Labels:
- Labels:
-
Apache Hive