Reply
New Contributor
Posts: 1
Registered: ‎11-17-2016

Offloading ETL to Hadoop

We are serious about offloading ETL to hadoop. The existing data models are highly normalized in the current integration layer in the database. Below are specific questions with regards to the same:

 

1. Dimension modelling

    Star schema has been tried on Hive. But the numerous joins have been bit of an overkill(queries run long and     error out). Is there recommendation with file formats, compression to suit this use case? Are there                     alternate modelling techniques that can be used?


2. Surrogate key/foreign key - Normalization

    Generation of Surrogate keys is something that we feel essential at data warehouse end. Is there a way of         generating keys in ways similar to sequences on database?


3. Data refresh for lookups without incremental CDC

    What could be the ideal way to refresh lookups in Hadoop?


4. SCD1 for reference data

    What is the best way to implement SCD1? Is it to use workarounds such as insert overwrite in hive or writing     java map reduce/spark code to implement it?


5. SCD2 for history

    What is the best way to implement SCD2? Is it to use workarounds such as insert overwrite union all,                 analytical functions in hive or writing java map reduce/spark code to implement it?

Explorer
Posts: 9
Registered: ‎04-20-2019

Re: Offloading ETL to Hadoop

Hey Vikas

Did you find any solution here?

Learner