Welcome to the Cloudera Community

Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this topic

Insert into partitioned parquet table with Impala 1.1.1 creates many small files

avatar
New Contributor

Impala statement

INSERT INTO <parquet_table> PARTITION(...) SELECT * FROM <avro_table>

creates many ~350 MB parquet files in every partition.

 

"Parquet data files use a 1GB block size, so when deciding how finely to partition the data, try to find a granularity where each partition contains 1GB or more of data, rather than creating a large number of smaller files split among many partitions."

 

I use impalad version 1.1.1 RELEASE (build 83d5868f005966883a918a819a449f636a5b3d5f)

 

How to increase parquet file size?

 

Thanks,

Alex

Who agreed with this topic