Created 12-15-2015 07:25 PM
Hello Friends,
I am new to Hadoop and Hive. Created a simple table with one column as ID and loaded data into this table from a file that's located in local filesyste with 6 (one with NULL) records with command as "load data local inpath '/home/edureka/Desktop/data' into table emp;"
Did select and it's show 5 records. Later manually changed the source file and removed all those 5 records and added 5 new records and loaded these new records without using the OVERWRITE with command "load data local inpath '/home/edureka/Desktop/data' into table emp;" Data load was successful.
This time if I do select * then I am getting 18 records. 1st set is repeating twice. I don't know why it's showing like this. Am I missing any command ? pls help me to understand.
Pls refer the screenshot.
Thankscapture.png
Created 12-16-2015 02:00 AM
I think you answered your own question: you did not use OVERWRITE on the second "load" command, so you added the records twice. If you wanted to start over w/ all new data in the table, run the load command with OVERWRITE.
Created 12-16-2015 02:00 AM
I think you answered your own question: you did not use OVERWRITE on the second "load" command, so you added the records twice. If you wanted to start over w/ all new data in the table, run the load command with OVERWRITE.
Created 12-16-2015 04:41 PM
Thanks. What should I use if I want to override all data (I mean fresh load)?
Created 12-16-2015 05:08 PM
If you do an "INSERT OVERWRITE" then all the files in the table's LOCATION will be deleted and replaced with the new data.
Created 12-16-2015 08:45 AM
Have you expected to end up with only 12 records instead of 18?
If '/home/edureka/Desktop/data' is a directory and your file called d1 I suspect that after changing it file d1~ was created. So the second time both files were loaded into the table causing 18 records.
Created 12-16-2015 04:43 PM
No, I have single file in data directory. First time I had created with 5 records and loaded this file into table and came back to same file, deleted all 5 records and entered 5 new records and reloaded again. I was missing the OVERWRITE keyword in the query. now it's fine. Thanks.