Reply
Explorer
Posts: 11
Registered: ‎02-15-2017

Set destination filenames during dynamic partition Inserts in Hive

Hello,

 

When we insert data from staging table into a production table using dynamic partition inserts, the files created at the partition directory are like: 0000_0.

However, say, for a process where data is loaded on a daily basis, after the first data insertion in a partition, the file names are like 0000_0_copy_1 for the second day, 0000_0_copy_2 for the third day and so on...

I want to create a filename like so: partitionName_datestamp [ex. IND_20173107] so that it helps to maintain a logical and relevant file structure for any manual intervention needs.

 

I am aware that we can achieve this by executing a shell script after Hive jobs.

But, can we control this from within Hive?

 

Regards,

Anirban.


PS: I am using Cloudera 5.8. Hive table backed as parquet.

Posts: 642
Topics: 3
Kudos: 103
Solutions: 67
Registered: ‎08-16-2016

Re: Set destination filenames during dynamic partition Inserts in Hive

I can't seem to find anything but I thought you could change the prefix. I feel sure you can for MR jobs, but not sure for Hive. But if it is a MR property you could set that in your Hive session.

The other thing to talk about here is that *_copy_1 is part of the Hive code for dynamic partitions. It checks before hand if 0000_0 already exist, possible from another reducer or another Hive process. It then appends _copy_# to protect the data. This will remain regardless of the prefix. So in theory, even if you went down to the millisecond, you could end up with identical files with the same name.

Changing the prefix should help your case though, so try finding something on changing the output file prefix.
Explorer
Posts: 11
Registered: ‎02-15-2017

Re: Set destination filenames during dynamic partition Inserts in Hive

i searched but could not get hold of any such properties...

Highlighted
Champion
Posts: 562
Registered: ‎05-16-2016

Re: Set destination filenames during dynamic partition Inserts in Hive

@anirbandd  You can write your own custom reducer class things like LazyOutputFormat ,etc  

i believe there is no property that you can tweak in your mapred or hive xml for your custom file output format while performing in  hive 

 

Explorer
Posts: 11
Registered: ‎02-15-2017

Re: Set destination filenames during dynamic partition Inserts in Hive

I was afraid of this...

thank you for clarifying @csguna :)
Announcements