Reply
New Contributor
Posts: 2
Registered: ‎11-13-2013

Insert into partitioned parquet table with Impala 1.1.1 creates many small files

Impala statement

INSERT INTO <parquet_table> PARTITION(...) SELECT * FROM <avro_table>

creates many ~350 MB parquet files in every partition.

 

"Parquet data files use a 1GB block size, so when deciding how finely to partition the data, try to find a granularity where each partition contains 1GB or more of data, rather than creating a large number of smaller files split among many partitions."

 

I use impalad version 1.1.1 RELEASE (build 83d5868f005966883a918a819a449f636a5b3d5f)

 

How to increase parquet file size?

 

Thanks,

Alex