Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive Dynamic Partition creates multiple files per partition.

Hive Dynamic Partition creates multiple files per partition.

Contributor

Hive Dynamic Partition creates multiple files per partition.

 

Here is log when creating multiple files...

Partition raw.somedata {dt=2015-12-29, node=FA} stats: [numFiles=4, numRows=69669241, totalSize=1365304329, rawDataSize=1045038615]

 

I need each partition to have exactly one file. Reason why I need this because we are using Parquet data format, and we want to create 1GB block size parquet files for better performance.

 

I did set  following when running hive, however still splits to multiple files

SET dfs.block.size=1073741824;  
SET parquet.block.size=1073741824;

 

thanks in advance!

2 REPLIES 2

Re: Hive Dynamic Partition creates multiple files per partition.

Cloudera Employee
How are you writing your data? Which command are you using?

Notice that running different INSERT commands will create one file per command run, even if it is in the same partition.

Re: Hive Dynamic Partition creates multiple files per partition.

Contributor

Sorry for late response

 

im using dynamic partition insert, is there some options i can force dynamic partition to create one file?

Settting number of reducer to 1 will set all MR job reducer to 1. Some jobs will require more reducers dependeing on complexity of the where conditions.


SET hive.exec.compress.intermediate=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;
SET hive.exec.dynamic.partition=true;
SET hive.exec.max.dynamic.partitions.pernode=10000;
SET dfs.block.size=1073741824;
SET parquet.block.size=1073741824;

INSERT OVERWRITE TABLE raw.erd PARTITION (dt='${hivevar:dt}', node)
SELECT ...
FROM ...
WHERE ...

Thanks
Ben