Created 02-28-2016 08:37 PM
Hi All,
I have created an external table from an existing parquet file:
+---------+------------------+-----------------------------+
| name | type | comment |
+---------+------------------+-----------------------------+
| name | string | Inferred from Parquet file. |
| address | struct< | Inferred from Parquet file. |
| | street:string, | |
| | city:string | |
| | > | |
+---------+------------------+-----------------------------+
Then I was trying to learn how schema evolution works and I created a new parquet file with a slightly different schema. New file now has an extra field in address called 'house_no'. Now if I executed a query involving address field, I woud get an error:
Query: select address.house_no, address.street, address.city from existing_parquet
ERROR: AnalysisException: Could not resolve column/field reference: 'address.house_no'
Then I was trying to alter the table to accomadate the new field in address, but I coudn't figure out a way to do that. For instance, if I executed something like:
ALTER TABLE existing_parquet ADD COLUMNS (address.house_no INTEGER);
I'd get a syntax error:
ERROR: AnalysisException: Syntax error in line 1:
...rquet ADD COLUMNS (address.house_no INTEGER)
^
Encountered: .
Expected: ARRAY, BIGINT, BINARY, BOOLEAN, CHAR, DATE, DATETIME, DECIMAL, REAL, FLOAT, INTEGER, MAP, SMALLINT, STRING, STRUCT, TIMESTAMP, TINYINT, VARCHAR
CAUSED BY: Exception: Syntax error
So my question is is it possible to add a field to a complex type? If so how do I refer to the nested columns?
Thanks
Lloyd
Created 02-28-2016 10:09 PM
Hi Lloyd,
you need to alter the type of the top-level column, i.e.,:
ALTER TABLE existing_parquet
CHANGE COLUMN address address STRUCT<:street:STRING,city:STRING,house_no INTEGER>
Unfortunately, there currently is no other way to make alterations to nested fields.
Alex
Created on 03-02-2016 04:30 PM - edited 03-02-2016 04:32 PM
You need to change nested field using hive. Here are my steps:
1. change top-level column type from hive. It supports complex types in 'alter table x change column y y struct<xyz>'
2. from impala shell run 'refresh *table_name*'
@alex.behm is there a ticket in jira for changing top-level columns of complex types? The alter command fails with 'invalid type' error message. Here is an example that works in hive but fails in impala:
alter table A change column B B struct<C: string, NEW_D: string>;
Created 03-02-2016 06:21 PM
Indeed there is, and the issue has been fixed! See:
https://issues.cloudera.org/browse/IMPALA-2974
You can work around that specific problem in Impala by using the following syntax.
ALTER TABLE A REPLACE COLUMNS (complete list of column definitions)
My apologies for the inconvenience.