Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

hive set block size not working

hive set block size not working

New Contributor

I did some experiment on hive. It looks like no matther how much I put on set block size, hive always gave the same result on parquet file sizes. There are a lot small files. Here are the table properties. Can anyone help me? Thanks in advance!

 

SET hive.exec.dynamic.partition.mode=nonstrict;

SET parquet.column.index.access=true;

SET hive.merge.mapredfiles=true;

SET hive.exec.compress.output=true;

SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;

SET mapred.output.compression.type=BLOCK;

SET parquet.compression=SNAPPY;

SET dfs.block.size=445644800;

SET parquet.block.size=445644800;

1 REPLY 1
Highlighted

Re: hive set block size not working

Champion

@KeepCalmNCode

 

You have mentioned there are lot of small files. And you set the block.size as 445644800 (which is 445 MB approx)

 

If your block.size > small file  then  you will not find any difference

 

Ex: All the below will give the same result

445 MB > 1 MB 

400 MB > 1 MB

300 MB > 1 MB

200 MB > 1 MB

100 MB > 1 MB

10 MB > 1 MB

2 MB > 1 MB

 

may be you will find difference in file size when you set the block.size < small file

 

 

 

Don't have an account?
Coming from Hortonworks? Activate your account here