Support Questions
Find answers, ask questions, and share your expertise

Overwriting a column in HIVE

Contributor

Hello there,

I need to overwrite a column in a HIVE table with data from another HIVE table.

In my exemple I have a a table employees and a table salary. I need to overwrite just ONE column of the employees table....let's a say a from start_date with values from called salary_date in the salary table. Both salary and employees table have a a common key which is the coluns employee_number.

Here is what I have been trying to do without success.

INSERT OVERWRITE TABLE employees SELECT employees.start_date salary.salary_date FROM salary employees WHERE salary.employee_number = employee.employee_number

Any insights on where I might be making a mistake here would help. I see some exemples suggesting to use partitions but I am not sure how to use this in this case, as I am considering to overwrite the whole start_date column and not just some values.

Thanks!

Wellington

3 REPLIES 3

You cannot overwrite one column you need to recreate the whole table. Like in the CTAS discussion we had. So if your employees table has 10 columns you need something like

INSERT OVERWRITE TABLE employees SELECT employees.<all columns but salary_date>, salary.salary_date FROM salary employees WHERE salary.employee_number = employee.employee_number;

You also need to order the columns correctly like:

select e.id, e.date,salary.salary_date,e.name,e.lastname. ....

Contributor

Thanks for the clarifications, Benjamin. It makes sense. I did as you instructed and I am getting an error related to the salary.employee_number... I have checked the expression and all the naming seems accurate. Have you had this error?

Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 1:227 Invalid table alias or column reference 'salary': (possible column names are:.....

FROM salary employees will syntax above alias the salary table as employees. I guess your query needs to join on employees table with left or inner join? Also you will probably not want to overwrite the table you are selecting from?

INSERT OVERWRITE TABLE employees SELECT employees.<all columns but salary_date>, salary.salary_date FROM salary inner join employees on salary.employee_number = employee.employee_number;