03-20-2018 01:42 PM
We have a need to get the date the last time a row was processed. The obvious answers are to have a new column with a version or timestamp that gets added during processing. This seems unnecessary. I was hoping there was a way to have Impala expose file ctime, mtime, or atime that's already sitting in HDFS.
03-21-2018 06:01 PM
I really like your idea of exposing HDFS file metadata through SQL. Unfortunately, there's no way to do this in Impala SQL today.
We've thought about this sort of thing in the past and mused with exposing the metadata through a "special" virtual column in each table, along these lines:
<regular columns shown here>,
or sometihng along those lines.
Feel free to file a feature request at https://issues.apache.org/jira/projects/IMPALA
If you want to take a stab at the implementation, we'd be happy to advise you on email@example.com