Support Questions
Find answers, ask questions, and share your expertise

How to merge parquet files with different structures?

How to merge parquet files with different structures?

Rising Star

I need to merge some small parquet files into one larger file, as too many files on hdfs cost too large name node memory.

There small parquet files have similar structures, but not same.

when I use parquet-tools to merge them, it throw: could not merge metadata key org.apache.spark.sql.parquet.row.metadata has conflicting values

Spark is a good way, but it's to slow comparing to parquet-tools

Is there any good solutions? Thanks!