We are looking at migrating files(less than 5 Mb of data in total) with variable record lengths from a mainframe system to hive.You could think of this as metadata.Each of these records can have columns ranging from 3 to n( means each record type have different number of columns) based on record type.What would be the best strategy to migrate this to hive .I was thinking of converting these files into one variable length csv file and then importing them to a hive table .Hive table will consist of 4 columns with the 4th column having comma separated list of values from column column 4 to n.Are there other alternative or better approaches for this solution.Appreciate any feedback on this.
Can new columns be added or remove? Does it happen often? What is the access pattern of the users or applications?
Based on the small file size and the variable column length I would lean towards HBase. More info is needed to fully evaulate (i.e. if the app or users use and know SQL it would add more weight to using Hive).
Now if you have to do Hive, then, I would flesh out each record to align the columns.