Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to merge parquet files with different structures?

Highlighted

How to merge parquet files with different structures?

Rising Star

I need to merge some small parquet files into one larger file, as too many files on hdfs cost too large name node memory.

There small parquet files have similar structures, but not same.

when I use parquet-tools to merge them, it throw: could not merge metadata key org.apache.spark.sql.parquet.row.metadata has conflicting values

Spark is a good way, but it's to slow comparing to parquet-tools

Is there any good solutions? Thanks!

Don't have an account?
Coming from Hortonworks? Activate your account here