Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Loading data into Hive Table from HDFS deletes the file from source directory(HDFS).

avatar
Rising Star

Hi All,

When we Load data into Hive table from HDFS, it deletes the file from source directory(HDFS) is there a way we can keep the file in the source directory and load the data into hive table as well.

I used the below query;

LOAD DATA INPATH 'source_file_path' OVERWRITE INTO TABLE TABLENAME;

1 ACCEPTED SOLUTION

avatar
Super Guru

@Ravikumar Kumashi

If you don't want to loss the source data copy while loading then the best way would be to create external table over that existing hdfs directory OR you can also make a copy of your source directory and create an external hive table that should point to new dir location.

hadoop fs -cp /path/old/hivetable /path/new/hivetable
create external table table_name ( id int, myfields string )
 location '/path/new/hivetable';

View solution in original post

3 REPLIES 3

avatar
Super Guru

@Ravikumar Kumashi

If you don't want to loss the source data copy while loading then the best way would be to create external table over that existing hdfs directory OR you can also make a copy of your source directory and create an external hive table that should point to new dir location.

hadoop fs -cp /path/old/hivetable /path/new/hivetable
create external table table_name ( id int, myfields string )
 location '/path/new/hivetable';

avatar
Rising Star

@Jitendra Yadav

Thank you!!! that works for me.

I thought there is a way to keep the file in source directory and load the data into managed table as well and looks like there is no way for that.

avatar
New Contributor

In my case, source file gets removed, when I load a single file with 'OVERWRITE' clause.

files stay when I load without 'OVERWRITE' clause for a set of files with a pattern (say _*.txt)