Reply
New Contributor
Posts: 1
Registered: ‎05-29-2018

hive set block size not working

I did some experiment on hive. It looks like no matther how much I put on set block size, hive always gave the same result on parquet file sizes. There are a lot small files. Here are the table properties. Can anyone help me? Thanks in advance!

 

SET hive.exec.dynamic.partition.mode=nonstrict;

SET parquet.column.index.access=true;

SET hive.merge.mapredfiles=true;

SET hive.exec.compress.output=true;

SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;

SET mapred.output.compression.type=BLOCK;

SET parquet.compression=SNAPPY;

SET dfs.block.size=445644800;

SET parquet.block.size=445644800;

Posts: 468
Topics: 14
Kudos: 77
Solutions: 41
Registered: ‎09-02-2016

Re: hive set block size not working

@KeepCalmNCode

 

You have mentioned there are lot of small files. And you set the block.size as 445644800 (which is 445 MB approx)

 

If your block.size > small file  then  you will not find any difference

 

Ex: All the below will give the same result

445 MB > 1 MB 

400 MB > 1 MB

300 MB > 1 MB

200 MB > 1 MB

100 MB > 1 MB

10 MB > 1 MB

2 MB > 1 MB

 

may be you will find difference in file size when you set the block.size < small file

 

 

 

Announcements