Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Offloading ETL to Hadoop

Offloading ETL to Hadoop

New Contributor

We are serious about offloading ETL to hadoop. The existing data models are highly normalized in the current integration layer in the database. Below are specific questions with regards to the same:

 

1. Dimension modelling

    Star schema has been tried on Hive. But the numerous joins have been bit of an overkill(queries run long and     error out). Is there recommendation with file formats, compression to suit this use case? Are there                     alternate modelling techniques that can be used?


2. Surrogate key/foreign key - Normalization

    Generation of Surrogate keys is something that we feel essential at data warehouse end. Is there a way of         generating keys in ways similar to sequences on database?


3. Data refresh for lookups without incremental CDC

    What could be the ideal way to refresh lookups in Hadoop?


4. SCD1 for reference data

    What is the best way to implement SCD1? Is it to use workarounds such as insert overwrite in hive or writing     java map reduce/spark code to implement it?


5. SCD2 for history

    What is the best way to implement SCD2? Is it to use workarounds such as insert overwrite union all,                 analytical functions in hive or writing java map reduce/spark code to implement it?

1 REPLY 1
Highlighted

Re: Offloading ETL to Hadoop

Explorer
Hey Vikas

Did you find any solution here?

Learner