i have one .csv file in HDFS size of 100 MB. i am uploading this file in HIVE database table for processing. this table path in HDFS is hive/apps/warehouse. so it occupies another 100 MB physical storage ? means 100MB(HDFS)+100MB(HIVE)
There are two kinds of Hive tables that you may use:
1) Hive Managed Table
After creating your table (let's call it "test_table") you will load the data into it using the following syntax:
LOAD DATA INPATH '/hdfs_path/to/file/mycsv.csv' OVERWRITE INTO TABLE test_table
This will actually move the file to the Hive warehouse directory. So there will only be one copy, which will exist in the warehouse folder.
You can also use the below syntax:
LOAD DATA LOCAL INPATH '/local_filesystem/path/mycsv.csv' OVERWRITE INTO TABLE test_table
This will load the data directly from your local filesystem, not HDFS.
2) Hive External Table
When creating your table you would define it as follows:
CREATE EXTERNAL TABLE test_table ( column1 type, column2 type ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '/hdfs_path/to/file/mycsv.csv';
This will leave the file in it's place but will create the metadata in the metastore and point to this file. So, again, there is only one copy of the file.
You can also create a Managed Table from data that exists in another table (external, managed or even a view). This is done using the following syntax:
CREATE TABLE new_table AS SELECT column1, column2, column3 FROM test_table
So what's the difference between Managed and External Tables?
The main difference is that when you use a managed table, HiveServer manages both the metadata in the metastore as well as the underlying data (which it moved to the warehouse directory). If you drop the table, then the info in the metastore is removed and the data on the filesystem is deleted as well.
With External Tables, HiveServer only manages the metdata in the metastore. If you drop the table, only the metadata in the Hive Metastore is removed. The actual files/data, however, stays on HDFS.
The below link provides more information and examples: