Reply
New Contributor
Posts: 3
Registered: ‎10-26-2015

Impala support for INPUT__FILE__NAME in hive

Hi,

 

Is there a way (a virtual column or function) to get the record's filename in impala queries just similar to INPUT__FILE__NAME in hive.

There is a jira about that, however there is nothing done by anybody with the jira.

 

Thanks

Cloudera Employee
Posts: 416
Registered: ‎07-29-2015

Re: Impala support for INPUT__FILE__NAME in hive

Hi hakki,
We haven't implemented that yet. Currently I think the only way to get
that info is to run "Show files".

- Tim
New Contributor
Posts: 3
Registered: ‎10-26-2015

Re: Impala support for INPUT__FILE__NAME in hive

With show files, I dont know the record corresponds to which file.
Cloudera Employee
Posts: 82
Registered: ‎12-07-2015

Re: Impala support for INPUT__FILE__NAME in hive

Hi hakki,

 

You could try partitioning your table and using the "show partitions" command to narrow down the set of files a row might be in. Can you give more context on the usecase without sharing private information?

 

Cheers, Lars

Cloudera Employee
Posts: 416
Registered: ‎07-29-2015

Re: Impala support for INPUT__FILE__NAME in hive

That's a good point. I updated the JIRA description to provide that additional motivation.

 

As an open-source project, we're somewhat dependent on people finding time to pick up new features like this that are nice-to-have but not critical for many users.

New Contributor
Posts: 3
Registered: ‎10-26-2015

Re: Impala support for INPUT__FILE__NAME in hive

Yes, I thought partitioning on the pattern of filename. For one type of data I have, it is acceptable since there are only 10-15 different types of file name. However; in another type of data, there is a unique "id" field on filename and this field's frequency is very high. So, there would be the threat of creation of lots of partitions which could be compelling for the catalog server.