Support Questions

KeepCalmNCode · ‎06-13-2018

I did some experiment on hive. It looks like no matther how much I put on set block size, hive always gave the same result on parquet file sizes. There are a lot small files. Here are the table properties. Can anyone help me? Thanks in advance!

SET hive.exec.dynamic.partition.mode=nonstrict;

SET parquet.column.index.access=true;

SET hive.merge.mapredfiles=true;

SET hive.exec.compress.output=true;

SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;

SET mapred.output.compression.type=BLOCK;

SET parquet.compression=SNAPPY;

SET dfs.block.size=445644800;

SET parquet.block.size=445644800;

saranvisa · ‎06-14-2018

@KeepCalmNCode

You have mentioned there are lot of small files. And you set the block.size as 445644800 (which is 445 MB approx)

If your block.size > small file then you will not find any difference

Ex: All the below will give the same result

445 MB > 1 MB

400 MB > 1 MB

300 MB > 1 MB

200 MB > 1 MB

100 MB > 1 MB

10 MB > 1 MB

2 MB > 1 MB

may be you will find difference in file size when you set the block.size < small file

nathann · ‎05-13-2021

how exactly do you increase the file size created by the hive job then?

duongnta · ‎04-20-2022

You can use one of those query to reduce num of file in insert query, so it will increase the file size:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-Syntaxof...

Cloudera Community

Support Questions

hive set block size not working