I need to merge some small parquet files into one larger file, as too many files on hdfs cost too large name node memory.
There small parquet files have similar structures, but not same.
when I use parquet-tools to merge them, it throw: could not merge metadata key org.apache.spark.sql.parquet.row.metadata has conflicting values
Spark is a good way, but it's to slow comparing to parquet-tools
Is there any good solutions? Thanks!