Posts: 11
Registered: ‎05-01-2014

Convert String column to int

[ Edited ]

How can I convert a column that has numeric values loaded as strings to be an int data type?  I know how to cast the results as int's, but is there a way to convert the column type and the data in the column?  How about just creating a new column with data from another one?  


Is there any performance to be gained by not using cast's? 


Edit: the data is stored as a parquetfile

Cloudera Employee
Posts: 14
Registered: ‎09-09-2013

Re: Convert String column to int

You should definitely avoid casts if possible. It will lead to better
performance since it means there is less processing work that is needed
when you execute a query.

How you go about performing the conversion can vary based on the underlying
file format. For text file format this is as simple as changing the table
alter table change column ;

For a binary file format (parquet), the data is stored on disk differently
based on the column type. This allows for fast querying of the data, but
changing the type is more than a metadata-only operations.

For parquet, you could create a staging table that has the desired schema
and then run the following. Adding in the appropriate casts to match the
column types in the staging table.

Once you are done you can perform:
ALTER TABLE RENAME src_table src_table_old;
ALTER TABLE RENAME staging_table src_table;

This way you will pay the cost of casting the data once (when you write it
to disk) rather than for every query of the data.