Reply
Highlighted
New Contributor
Posts: 4
Registered: ‎04-03-2017

Getting avro/parquet ctime, mtime, and atime in Impala external tables

We have a need to get the date the last time a row was processed. The obvious answers are to have a new column with a version or timestamp that gets added during processing. This seems unnecessary. I was hoping there was a way to have Impala expose file ctime, mtime, or atime that's already sitting in HDFS.

Cloudera Employee
Posts: 301
Registered: ‎10-16-2013

Re: Getting avro/parquet ctime, mtime, and atime in Impala external tables

Hi!

 

I really like your idea of exposing HDFS file metadata through SQL. Unfortunately, there's no way to do this in Impala SQL today.

 

We've thought about this sort of thing in the past and mused with exposing the metadata through a "special" virtual column in each table, along these lines:

 

describe mytable;

<regular columns shown here>,

fs_metadata struct<filename:string,bigint,ctime:bigint,attime:bigint>

 

or sometihng along those lines.


Feel free to file a feature request at https://issues.apache.org/jira/projects/IMPALA

 

If you want to take a stab at the implementation, we'd be happy to advise you on dev@impala.apache.org

 

Announcements