Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Insert into partitioned parquet table with Impala 1.1.1 creates many small files

Insert into partitioned parquet table with Impala 1.1.1 creates many small files

New Contributor

Impala statement

INSERT INTO <parquet_table> PARTITION(...) SELECT * FROM <avro_table>

creates many ~350 MB parquet files in every partition.

 

"Parquet data files use a 1GB block size, so when deciding how finely to partition the data, try to find a granularity where each partition contains 1GB or more of data, rather than creating a large number of smaller files split among many partitions."

 

I use impalad version 1.1.1 RELEASE (build 83d5868f005966883a918a819a449f636a5b3d5f)

 

How to increase parquet file size?

 

Thanks,

Alex