Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.
I did some experiment on hive. It looks like no matther how much I put on set block size, hive always gave the same result on parquet file sizes. There are a lot small files. Here are the table properties. Can anyone help me? Thanks in advance! SET hive.exec.dynamic.partition.mode=nonstrict; SET parquet.column.index.access=true; SET hive.merge.mapredfiles=true; SET hive.exec.compress.output=true; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec; SET mapred.output.compression.type=BLOCK; SET parquet.compression=SNAPPY; SET dfs.block.size=445644800; SET parquet.block.size=445644800;
... View more