We had a Hive table which has its "location" in a hdfs directory. The location directory has a bunch of small data files which represents the data of the table. Data keeps coming into the location directory and so numerous small files are created all the time.
But for performance we want to merge all these small files into a larger file on a periodic basis.
Whats the best way to do this? I hear that Hive itself has a merge option?