Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Add filename to import in Hive

avatar
Explorer

Hi guys,

 

I am new to Cloudera and have only used Impala and Hive through Hue thus far.

 

I have a folder of CSVs that I imported, but would like to add the filename as a column to each row that I import.

 

How do I do this?

 

I came across the function 

INPUT__FILE__NAME

 but I don't see it actually doing anything.

 

My import is very easy - hoping you can help me?

 

create external table hive_magnum
(
col1 string,
col2 string,
col3 string
)
row format delimited fields terminated by "," 
location '/user/xxxxxx/magnum';

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Hi Lucille,

 

For the example you provided, you could get the file names with a query like this:

 

SELECT hive_magnum.col1, hive_magnum.col2, hive_magnum.col3, hive_magnum.INPUT__FILE__NAME
FROM
hive_magnum;

It will actually provide the full HDFS location, which includes the file name. 

 

I hope this is helpful.

View solution in original post

3 REPLIES 3

avatar
Expert Contributor

Hi Lucille,

 

For the example you provided, you could get the file names with a query like this:

 

SELECT hive_magnum.col1, hive_magnum.col2, hive_magnum.col3, hive_magnum.INPUT__FILE__NAME
FROM
hive_magnum;

It will actually provide the full HDFS location, which includes the file name. 

 

I hope this is helpful.

avatar
Explorer

thank you so much - this worked 🙂

 

I just have one more questions.. to get this into the another table - it doesn't seem to accept this.. what is wrong with it?

It also did not like the <pre> bit... If I take it out, it runs..


create table tester
(
SELECT hive_magnum.col1, hive_magnum.col2, hive_magnum.col3, hive_magnum.INPUT__FILE__NAME as file_name FROM hive_magnum)

 

 

avatar
Expert Contributor
The
 bit was an attempt to format the post. It wasn't supposed to be
part of the query, sorry about that.

Try adding "as" after "create table tester" and before the nested select
query.