07-31-2017 03:39 AM
When we insert data from staging table into a production table using dynamic partition inserts, the files created at the partition directory are like: 0000_0.
However, say, for a process where data is loaded on a daily basis, after the first data insertion in a partition, the file names are like 0000_0_copy_1 for the second day, 0000_0_copy_2 for the third day and so on...
I want to create a filename like so: partitionName_datestamp [ex. IND_20173107] so that it helps to maintain a logical and relevant file structure for any manual intervention needs.
I am aware that we can achieve this by executing a shell script after Hive jobs.
But, can we control this from within Hive?
PS: I am using Cloudera 5.8. Hive table backed as parquet.
07-31-2017 11:33 AM
08-08-2017 10:41 AM
@anirbandd You can write your own custom reducer class things like LazyOutputFormat ,etc
i believe there is no property that you can tweak in your mapred or hive xml for your custom file output format while performing in hive
08-07-2018 04:34 AM
Were you able to set custom prefix ? I want to do multiple inserts into same partition. I hope if custom prefix works, I can do multiple inserts in hive table. Any suggestions appreciated