I have a requirement for Compacting orc small files within HDFS directories and grouping them into larger files in an offline fashion.
Approach1 : Using SparkDataFrame we can achieve this feature ( I know)
Approach 2 : Using Hadoop, ORC libraries using Scala. ( I dont know)
Can you please help me on the Approach 2. It will be great help if you give the sample code / guidance.