Created on 06-13-2016 05:22 AM - edited 09-16-2022 03:24 AM
Hi guys,
I am new to Cloudera and have only used Impala and Hive through Hue thus far.
I have a folder of CSVs that I imported, but would like to add the filename as a column to each row that I import.
How do I do this?
I came across the function
INPUT__FILE__NAME
but I don't see it actually doing anything.
My import is very easy - hoping you can help me?
create external table hive_magnum
(
col1 string,
col2 string,
col3 string
)
row format delimited fields terminated by ","
location '/user/xxxxxx/magnum';
Created on 06-17-2016 02:56 PM - edited 06-17-2016 02:57 PM
Hi Lucille,
For the example you provided, you could get the file names with a query like this:
SELECT hive_magnum.col1, hive_magnum.col2, hive_magnum.col3, hive_magnum.INPUT__FILE__NAME FROM hive_magnum;
It will actually provide the full HDFS location, which includes the file name.
I hope this is helpful.
Created on 06-17-2016 02:56 PM - edited 06-17-2016 02:57 PM
Hi Lucille,
For the example you provided, you could get the file names with a query like this:
SELECT hive_magnum.col1, hive_magnum.col2, hive_magnum.col3, hive_magnum.INPUT__FILE__NAME FROM hive_magnum;
It will actually provide the full HDFS location, which includes the file name.
I hope this is helpful.
Created 06-20-2016 01:58 AM
thank you so much - this worked 🙂
I just have one more questions.. to get this into the another table - it doesn't seem to accept this.. what is wrong with it?
It also did not like the <pre> bit... If I take it out, it runs..
create table tester
(
SELECT hive_magnum.col1, hive_magnum.col2, hive_magnum.col3, hive_magnum.INPUT__FILE__NAME as file_name FROM hive_magnum)
Created 06-20-2016 06:59 AM
bit was an attempt to format the post. It wasn't supposed to be
part of the query, sorry about that.
Try adding "as" after "create table tester" and before the nested select
query.