Support Questions

Find answers, ask questions, and share your expertise

Hive table load

avatar
Expert Contributor

Hello Friends,

I am new to Hadoop and Hive. Created a simple table with one column as ID and loaded data into this table from a file that's located in local filesyste with 6 (one with NULL) records with command as "load data local inpath '/home/edureka/Desktop/data' into table emp;"

Did select and it's show 5 records. Later manually changed the source file and removed all those 5 records and added 5 new records and loaded these new records without using the OVERWRITE with command "load data local inpath '/home/edureka/Desktop/data' into table emp;" Data load was successful.

This time if I do select * then I am getting 18 records. 1st set is repeating twice. I don't know why it's showing like this. Am I missing any command ? pls help me to understand.

Pls refer the screenshot.

Thankscapture.png

1 ACCEPTED SOLUTION

avatar
Guru

I think you answered your own question: you did not use OVERWRITE on the second "load" command, so you added the records twice. If you wanted to start over w/ all new data in the table, run the load command with OVERWRITE.

View solution in original post

5 REPLIES 5

avatar
Guru

I think you answered your own question: you did not use OVERWRITE on the second "load" command, so you added the records twice. If you wanted to start over w/ all new data in the table, run the load command with OVERWRITE.

avatar
Expert Contributor

Thanks. What should I use if I want to override all data (I mean fresh load)?

avatar
Guru

If you do an "INSERT OVERWRITE" then all the files in the table's LOCATION will be deleted and replaced with the new data.

avatar
Master Guru

Have you expected to end up with only 12 records instead of 18?

If '/home/edureka/Desktop/data' is a directory and your file called d1 I suspect that after changing it file d1~ was created. So the second time both files were loaded into the table causing 18 records.

avatar
Expert Contributor

No, I have single file in data directory. First time I had created with 5 records and loaded this file into table and came back to same file, deleted all 5 records and entered 5 new records and reloaded again. I was missing the OVERWRITE keyword in the query. now it's fine. Thanks.