Created 07-23-2016 01:03 PM
Hi All,
When we Load data into Hive table from HDFS, it deletes the file from source directory(HDFS) is there a way we can keep the file in the source directory and load the data into hive table as well.
I used the below query;
LOAD DATA INPATH 'source_file_path' OVERWRITE INTO TABLE TABLENAME;
Created 07-23-2016 01:11 PM
If you don't want to loss the source data copy while loading then the best way would be to create external table over that existing hdfs directory OR you can also make a copy of your source directory and create an external hive table that should point to new dir location.
hadoop fs -cp /path/old/hivetable /path/new/hivetable create external table table_name ( id int, myfields string ) location '/path/new/hivetable';
Created 07-23-2016 01:11 PM
If you don't want to loss the source data copy while loading then the best way would be to create external table over that existing hdfs directory OR you can also make a copy of your source directory and create an external hive table that should point to new dir location.
hadoop fs -cp /path/old/hivetable /path/new/hivetable create external table table_name ( id int, myfields string ) location '/path/new/hivetable';
Created 07-23-2016 01:16 PM
Thank you!!! that works for me.
I thought there is a way to keep the file in source directory and load the data into managed table as well and looks like there is no way for that.
Created 09-30-2020 02:54 AM
In my case, source file gets removed, when I load a single file with 'OVERWRITE' clause.
files stay when I load without 'OVERWRITE' clause for a set of files with a pattern (say _*.txt)