New Contributor
Posts: 3
Registered: ‎02-20-2018

How to properly persist timestamp values in a Hive table based on delimited data files

We have a table defined in Hive which has a timestamp column (name it, ts) and the table is set up to be based on delimited data loaded from delimited data files.


Reading the Hive documentation from,


Timestamps are interpreted to be timezoneless and stored as an offset from the UNIX epoch.
Timestamps in text files have to use the format yyyy-mm-dd hh:mm:ss[.f...].


Assuming that we have a java.util.Date "in hand", what is the proper way to write the value into the delimited data file?


I am assuming that in Jave we'd be converting the Date value to String using the format "yyyy-MM-dd HH:mm:ss.SSS" and specifying the desired Time Zone to be UTC.


java.util.Date date = ...;
DateUtils.formatDate(date, "yyyy-MM-dd HH:mm:ss.SSS", TimeZone.getTimeZone("UTC"));

where DateUtils.formatDate is:

public static String formatDate(Date d, String format, TimeZone zone) {
    SimpleDateFormat formatter = new SimpleDateFormat(format);
    return formatter.format(d);

Is this correct?