I have an ETL flow which transfers data from a hive table to another through pyspark. The tables are partitioned. Although I see that in the partition's path in HDFS there are small parquet files. I want to ask:
1)How can I merge these files?
2)Is there any max size or recommended size for hive partitions?