Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: The Cloudera Community will undergo maintenance on Saturday, August 17 at 12:00am PDT. See more info here.

hive set block size not working

hive set block size not working

New Contributor

I did some experiment on hive. It looks like no matther how much I put on set block size, hive always gave the same result on parquet file sizes. There are a lot small files. Here are the table properties. Can anyone help me? Thanks in advance!

 

SET hive.exec.dynamic.partition.mode=nonstrict;

SET parquet.column.index.access=true;

SET hive.merge.mapredfiles=true;

SET hive.exec.compress.output=true;

SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;

SET mapred.output.compression.type=BLOCK;

SET parquet.compression=SNAPPY;

SET dfs.block.size=445644800;

SET parquet.block.size=445644800;

1 REPLY 1
Highlighted

Re: hive set block size not working

Champion

@KeepCalmNCode

 

You have mentioned there are lot of small files. And you set the block.size as 445644800 (which is 445 MB approx)

 

If your block.size > small file  then  you will not find any difference

 

Ex: All the below will give the same result

445 MB > 1 MB 

400 MB > 1 MB

300 MB > 1 MB

200 MB > 1 MB

100 MB > 1 MB

10 MB > 1 MB

2 MB > 1 MB

 

may be you will find difference in file size when you set the block.size < small file