Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive Data Model

Hive Data Model

Champion Alumni

Hello hive users,

 
We are looking at migrating  files(less than 5 Mb of data in total) with variable record lengths from a mainframe system to hive.You could think of this as metadata.Each of these records can have columns  ranging from 3 to  n( means  each record type have different number of columns) based on record type.What would be the best strategy to migrate this  to hive .I was thinking of converting these files  into one  variable length csv file and then importing them to a hive table .Hive table will consist of 4 columns with the 4th column having comma separated list of  values from column column 4 to n.Are there other alternative or better approaches for this solution.Appreciate any  feedback on this.
 
Thanks,
Nishanth
1 REPLY 1
Highlighted

Re: Hive Data Model

Champion

Can new columns be added or remove?  Does it happen often?  What is the access pattern of the users or applications?

 

Based on the small file size and the variable column length I would lean towards HBase.  More info is needed to fully evaulate (i.e. if the app or users use and know SQL it would add more weight to using Hive).

 

Now if you have to do Hive, then, I would flesh out each record to align the columns.