Support Questions

Find answers, ask questions, and share your expertise

Data vanishing in HDFS after moving to Hive table ?

avatar
Contributor

 

Hi all,

 

I am using Quickstart VM 5.8.

I have loaded some flat files in HDFS .

I have created external table in hive as below :

 

CREATE External TABLE abc (ID int, Price double, Start_DTTM string, DEL_DT_TM string)
row format delimited fields terminated by ',' stored as textfile;

 

load data inpath '/user/cloudera/CPC/QSM/QSM_MarToApr2016.csv'  into table abc;

 

Data loaded successfully in Hive table.

But in HDFS data is vanishing .

 

Please suggest

 

Thanks,

Syam.

5 REPLIES 5

avatar
Champion
Where did you load the files in HDFS in the first step?
You did not specify a location in the create external table.. I believe it then defaults to the warehouse directory.

The load data inpath does move the data from the path specified to the tables location. I think it move it from /user/cloudera/CPC/QSM/QSM_MarToApr2016.csv to /user/hive/warehouse/abc/...

avatar
Contributor

Hi Mbigelow,

 

Thanks for the reply.

 

I have uploaded the falt file in HDFS location. (/user/clouder/QSM/)

And i created a table as above and loaded the data.

Data loaded successfully to hive.

 

But I dont want to move data to Hive warehouse.

Without vanishing data in HDFS. Hive results should come.

 

Please guide me.

 

Thanks,

Syam.

avatar
Champion
You should set the location for the table. If you don't want to move the data then set it to /user/Cloudera/QSM. Setting it to another location but still outside of the warehouse will still cause the data to move from the original to the tables location.

On the last statement, are you saying that after loading the data in the table it was no longer in the original location but you also weren't getting data returned from the table?

avatar
Contributor

Okay, I will set the HDFS location while creating the table.

 

Data vanishing from HDFS location, But in Hive location data is there.

 

Thanks,

Syam.

avatar
Champion

While you create external table  - mention the LOCATION ' '  ( i,e The default location of Hive table is overwritten by using LOCATION )

Then load data from HDFS using  ' inpath ' - if you drop the table it will only remove the pointer from the hdfs and will not delete the data in the hdfs. 

 

CREATE EXTERNAL TABLE text1 ( wban INT, date STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
LOCATION ‘ /hive/data/text1’;
LOAD DATA INPATH ‘hdfs:/data/2000.txt’ INTO TABLE TABLE_NAME ;