Support Questions

Find answers, ask questions, and share your expertise

hive set block size not working

avatar
New Contributor

I did some experiment on hive. It looks like no matther how much I put on set block size, hive always gave the same result on parquet file sizes. There are a lot small files. Here are the table properties. Can anyone help me? Thanks in advance!

 

SET hive.exec.dynamic.partition.mode=nonstrict;

SET parquet.column.index.access=true;

SET hive.merge.mapredfiles=true;

SET hive.exec.compress.output=true;

SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;

SET mapred.output.compression.type=BLOCK;

SET parquet.compression=SNAPPY;

SET dfs.block.size=445644800;

SET parquet.block.size=445644800;

3 REPLIES 3

avatar
Champion

@KeepCalmNCode

 

You have mentioned there are lot of small files. And you set the block.size as 445644800 (which is 445 MB approx)

 

If your block.size > small file  then  you will not find any difference

 

Ex: All the below will give the same result

445 MB > 1 MB 

400 MB > 1 MB

300 MB > 1 MB

200 MB > 1 MB

100 MB > 1 MB

10 MB > 1 MB

2 MB > 1 MB

 

may be you will find difference in file size when you set the block.size < small file

 

 

 

avatar
New Contributor

how exactly do you increase the file size created by the hive job then?

avatar
New Contributor

You can use one of those query to reduce num of file in insert query, so it will increase the file size:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-Syntaxof...