Support Questions
Find answers, ask questions, and share your expertise

how merge files on HDFS Hive?.. ¿como uno archivos en HDFS Hive?

how merge files on HDFS Hive?.. ¿como uno archivos en HDFS Hive?

New Contributor

Hello, I want to merge the files in a partition in HDFS, they are results of the same insertion in Hive, executed in different moments.

/apps/hive/warehouse/raw.db/XXX_XXXX/part_fecha_proceso=2018-12-05

Permission  Owner  Group  Size  Last Modified  Replication  BlockSize  Name
-rwxrwxrwx  hive  hadoop 54.25KB  11-12-2018 16:48:43  1  128MB 000000_0
-rwxrwxrwx  hive  hadoop 54.25KB  11-12-2018 16:51:53  1  128MB 000000_0_copy_1

I put this variables before the insert but not work.

set hive.execution.engine=tez;
set hive.merge.tezfiles=true;
set hive.merge.smallfiles.avgsize=128000000;
set hive.merge.size.per.task=128000000;

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.max.dynamic.partitions=100000;
SET hive.exec.max.dynamic.partitions.pernode=100000;  

I try this on sandbox HDP, can you help me please?

-------------------

Español..

Hola, quiero unir los archivos de una particion en HDFS resultantes de los mismos inserts en Hive ejecutados en momentos distintos.

/apps/hive/warehouse/raw.db/XXX_XXXX/part_fecha_proceso=2018-12-05

Permission  Owner  Group  Size  Last Modified  Replication  BlockSize  Name
-rwxrwxrwx  hive  hadoop 54.25KB  11-12-2018 16:48:43  1  128MB 000000_0
-rwxrwxrwx  hive  hadoop 54.25KB  11-12-2018 16:51:53  1  128MB 000000_0_copy_1

Coloqué las siguientes variables antes de realizar el insert, pero no funcionó.

set hive.execution.engine=tez;
set hive.merge.tezfiles=true;
set hive.merge.smallfiles.avgsize=128000000;
set hive.merge.size.per.task=128000000;

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.max.dynamic.partitions=100000;
SET hive.exec.max.dynamic.partitions.pernode=100000;  

Lo trate de realizar en el sandbox de HDP, pueden ayudarme porfavor?

1 REPLY 1
Highlighted

Re: how merge files on HDFS Hive?.. ¿como uno archivos en HDFS Hive?

Mentor

@Javier Tapia

Here is a simple way to merge

$ hdfs dfs -cat /apps/hive/warehouse/raw.db/XXX_XXXX/part_fecha_proceso=2018-12-05/* > /<hdfs_path>/part_fecha_proceso=2018-12-05/000000_0

HTH

Don't have an account?