Champion Alumni
Posts: 160
Registered: ‎02-11-2014

Hive Data Model

Hello hive users,

We are looking at migrating  files(less than 5 Mb of data in total) with variable record lengths from a mainframe system to hive.You could think of this as metadata.Each of these records can have columns  ranging from 3 to  n( means  each record type have different number of columns) based on record type.What would be the best strategy to migrate this  to hive .I was thinking of converting these files  into one  variable length csv file and then importing them to a hive table .Hive table will consist of 4 columns with the 4th column having comma separated list of  values from column column 4 to n.Are there other alternative or better approaches for this solution.Appreciate any  feedback on this.
Posts: 642
Topics: 3
Kudos: 103
Solutions: 66
Registered: ‎08-16-2016

Re: Hive Data Model

Can new columns be added or remove?  Does it happen often?  What is the access pattern of the users or applications?


Based on the small file size and the variable column length I would lean towards HBase.  More info is needed to fully evaulate (i.e. if the app or users use and know SQL it would add more weight to using Hive).


Now if you have to do Hive, then, I would flesh out each record to align the columns.