Im having data for 1 TB in 10 Partitions as a Sequence file. and reading it as sequence file and converting as Dataframe. then finally Writing the File in to Hive ORC table with insertoverwrite statement.
My Cluster is based on capacity scheduler.
in this case what should be the best configuration for configuring NumberofExecutors, DriverMemory and ExecutorMemory to load data faster in Hive tables.