Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Convert String column to int

Convert String column to int

Explorer

How can I convert a column that has numeric values loaded as strings to be an int data type?  I know how to cast the results as int's, but is there a way to convert the column type and the data in the column?  How about just creating a new column with data from another one?  

 

Is there any performance to be gained by not using cast's? 

 

Edit: the data is stored as a parquetfile

1 REPLY 1

Re: Convert String column to int

Cloudera Employee
You should definitely avoid casts if possible. It will lead to better
performance since it means there is less processing work that is needed
when you execute a query.

How you go about performing the conversion can vary based on the underlying
file format. For text file format this is as simple as changing the table
metadata:
alter table change column ;

For a binary file format (parquet), the data is stored on disk differently
based on the column type. This allows for fast querying of the data, but
changing the type is more than a metadata-only operations.

For parquet, you could create a staging table that has the desired schema
and then run the following. Adding in the appropriate casts to match the
column types in the staging table.
INSERT OVERWRITE SELECT col1, col2, ..., colN FROM
table>;

Once you are done you can perform:
ALTER TABLE RENAME src_table src_table_old;
ALTER TABLE RENAME staging_table src_table;

This way you will pay the cost of casting the data once (when you write it
to disk) rather than for every query of the data.

Thanks,
Lenni