- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Loading data into Hive Table from HDFS deletes the file from source directory(HDFS).
- Labels:
-
Apache Hive
Created ‎07-23-2016 01:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
When we Load data into Hive table from HDFS, it deletes the file from source directory(HDFS) is there a way we can keep the file in the source directory and load the data into hive table as well.
I used the below query;
LOAD DATA INPATH 'source_file_path' OVERWRITE INTO TABLE TABLENAME;
Created ‎07-23-2016 01:11 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you don't want to loss the source data copy while loading then the best way would be to create external table over that existing hdfs directory OR you can also make a copy of your source directory and create an external hive table that should point to new dir location.
hadoop fs -cp /path/old/hivetable /path/new/hivetable create external table table_name ( id int, myfields string ) location '/path/new/hivetable';
Created ‎07-23-2016 01:11 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you don't want to loss the source data copy while loading then the best way would be to create external table over that existing hdfs directory OR you can also make a copy of your source directory and create an external hive table that should point to new dir location.
hadoop fs -cp /path/old/hivetable /path/new/hivetable create external table table_name ( id int, myfields string ) location '/path/new/hivetable';
Created ‎07-23-2016 01:16 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you!!! that works for me.
I thought there is a way to keep the file in source directory and load the data into managed table as well and looks like there is no way for that.
Created ‎09-30-2020 02:54 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In my case, source file gets removed, when I load a single file with 'OVERWRITE' clause.
files stay when I load without 'OVERWRITE' clause for a set of files with a pattern (say _*.txt)
