Support Questions

anirbandd · ‎07-31-2017

Hello,

When we insert data from staging table into a production table using dynamic partition inserts, the files created at the partition directory are like: 0000_0.

However, say, for a process where data is loaded on a daily basis, after the first data insertion in a partition, the file names are like 0000_0_copy_1 for the second day, 0000_0_copy_2 for the third day and so on...

I want to create a filename like so: partitionName_datestamp [ex. IND_20173107] so that it helps to maintain a logical and relevant file structure for any manual intervention needs.

I am aware that we can achieve this by executing a shell script after Hive jobs.

But, can we control this from within Hive?

Regards,

Anirban.

PS: I am using Cloudera 5.8. Hive table backed as parquet.

mbigelow · ‎07-31-2017

I can't seem to find anything but I thought you could change the prefix. I feel sure you can for MR jobs, but not sure for Hive. But if it is a MR property you could set that in your Hive session.

The other thing to talk about here is that *_copy_1 is part of the Hive code for dynamic partitions. It checks before hand if 0000_0 already exist, possible from another reducer or another Hive process. It then appends _copy_# to protect the data. This will remain regardless of the prefix. So in theory, even if you went down to the millisecond, you could end up with identical files with the same name.

Changing the prefix should help your case though, so try finding something on changing the output file prefix.

anirbandd · ‎08-01-2017

i searched but could not get hold of any such properties...

csguna · ‎08-08-2017

@anirbandd You can write your own custom reducer class things like LazyOutputFormat ,etc

i believe there is no property that you can tweak in your mapred or hive xml for your custom file output format while performing in hive

anirbandd · ‎08-15-2017

I was afraid of this...

thank you for clarifying @csguna 🙂

vinuthna91 · ‎08-07-2018

Were you able to set custom prefix ? I want to do multiple inserts into same partition. I hope if custom prefix works, I can do multiple inserts in hive table. Any suggestions appreciated

Cloudera Community

Support Questions

Set destination filenames during dynamic partition Inserts in Hive