Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Data Modeling in Big Data - Star schema into Hive or One Big Table?

Solved Go to solution
Highlighted

Data Modeling in Big Data - Star schema into Hive or One Big Table?

Explorer

Hi experts, I've four .CSV (three dimensions and one Fact Table) in my HDFS. I already do some data cleansing in Apache PIG and I want to put them into Hive. My question is: There is a good idea creates the start schema in Hive or is a better idea to create one big table? I didn't find any good article that explains which is the better way to apply data modeling in Big Data. Many thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Data Modeling in Big Data - Star schema into Hive or One Big Table?

Guru

How big are your dimension tables? For best speed, some denormalization will help. However, with various improvements to hive and if your dimension tables are small enough for map join, you may not see a lot of difference between the two.

View solution in original post

5 REPLIES 5
Highlighted

Re: Data Modeling in Big Data - Star schema into Hive or One Big Table?

Guru

How big are your dimension tables? For best speed, some denormalization will help. However, with various improvements to hive and if your dimension tables are small enough for map join, you may not see a lot of difference between the two.

View solution in original post

Highlighted

Re: Data Modeling in Big Data - Star schema into Hive or One Big Table?

Explorer

Thanks Ravi :) Did you recommend any article that explain some methodologies to apply data modeling in Big Data? My dimensions are big, having a lot of columns...

Highlighted

Re: Data Modeling in Big Data - Star schema into Hive or One Big Table?

Super Guru

Hi Johnny

I would also suggest you consider complex types in Hive. They let you store data together for a row and avoid duplicating it and at the same time by not creating normalized tables, you avoid potentially expensive joins.

So think about nested data types like struck, map and array. This is a good middle ground between normalization and denormalization. It doesn't take as much space as a fully denormalized table and at the same time, queries are not as expensive as in a normalized model as you avoid expensive joins.

Re: Data Modeling in Big Data - Star schema into Hive or One Big Table?

Explorer

Hi mqureshi, many thanks for your help :) I will look for good articles/tutorials that show me how to use complex Types in Hive. Thanks!

Highlighted

Re: Data Modeling in Big Data - Star schema into Hive or One Big Table?

Explorer

João Souza, if you find some article can you share here? Many thanks!

Don't have an account?
Coming from Hortonworks? Activate your account here