Support Questions
Find answers, ask questions, and share your expertise

Insert into partitioned parquet table with Impala 1.1.1 creates many small files

New Contributor

Impala statement

INSERT INTO <parquet_table> PARTITION(...) SELECT * FROM <avro_table>

creates many ~350 MB parquet files in every partition.


"Parquet data files use a 1GB block size, so when deciding how finely to partition the data, try to find a granularity where each partition contains 1GB or more of data, rather than creating a large number of smaller files split among many partitions."


I use impalad version 1.1.1 RELEASE (build 83d5868f005966883a918a819a449f636a5b3d5f)


How to increase parquet file size?