Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Data Modeling in Big Data - Star schema into Hive or One Big Table?

avatar
Contributor

Hi experts, I've four .CSV (three dimensions and one Fact Table) in my HDFS. I already do some data cleansing in Apache PIG and I want to put them into Hive. My question is: There is a good idea creates the start schema in Hive or is a better idea to create one big table? I didn't find any good article that explains which is the better way to apply data modeling in Big Data. Many thanks!

1 ACCEPTED SOLUTION

avatar
Guru

How big are your dimension tables? For best speed, some denormalization will help. However, with various improvements to hive and if your dimension tables are small enough for map join, you may not see a lot of difference between the two.

View solution in original post

5 REPLIES 5

avatar
Guru

How big are your dimension tables? For best speed, some denormalization will help. However, with various improvements to hive and if your dimension tables are small enough for map join, you may not see a lot of difference between the two.

avatar
Contributor

Thanks Ravi 🙂 Did you recommend any article that explain some methodologies to apply data modeling in Big Data? My dimensions are big, having a lot of columns...

avatar
Super Guru

Hi Johnny

I would also suggest you consider complex types in Hive. They let you store data together for a row and avoid duplicating it and at the same time by not creating normalized tables, you avoid potentially expensive joins.

So think about nested data types like struck, map and array. This is a good middle ground between normalization and denormalization. It doesn't take as much space as a fully denormalized table and at the same time, queries are not as expensive as in a normalized model as you avoid expensive joins.

avatar
Rising Star

Hi mqureshi, many thanks for your help 🙂 I will look for good articles/tutorials that show me how to use complex Types in Hive. Thanks!

avatar
Contributor

João Souza, if you find some article can you share here? Many thanks!