Created on 02-25-2016 12:02 PM - edited 09-16-2022 03:05 AM
Hi,
Can you please explain difference between STRING and CHAR(10) datatype in impala ?
I understand there is no need to mention number of bytes in STRING datatype.
Storage,coding and performance wise which is better ?
Created 02-25-2016 01:01 PM
Hi @prakash pal there are some differences between these data types, basically string allows a variable length of characters (max 32K chars), char is a fixed length string (max. 255 chars). Usually (I doubt that this is different with Impala) CHAR is more efficient and can speed up operations and is better reg. memory allocation. (This does not mean always use CHAR)
See this => "All data in CHAR and VARCHAR columns must be in a character encoding that is compatible with UTF-8. If you have binary data from another database system (that is, a BLOB type), use a STRING column to hold it."
There are a lot of use cases where it makes sense to only use CHAR instead of STRING, e.g. lets say you want to have a column that stores the two-letter country code (ISO_3166-1_alpha-2; e.g. US, ES, UK,...), here it makes more sense to use CHAR.
Created 02-25-2016 12:05 PM
Impala is not part of HDP stack but just to help you out
1) http://www.cloudera.com/documentation/archive/impala/2-x/2-0-x/topics/impala_string_functions.html
2) Char http://www.cloudera.com/documentation/archive/impala/2-x/2-1-x/topics/impala_char.html
Created 02-25-2016 01:01 PM
Hi @prakash pal there are some differences between these data types, basically string allows a variable length of characters (max 32K chars), char is a fixed length string (max. 255 chars). Usually (I doubt that this is different with Impala) CHAR is more efficient and can speed up operations and is better reg. memory allocation. (This does not mean always use CHAR)
See this => "All data in CHAR and VARCHAR columns must be in a character encoding that is compatible with UTF-8. If you have binary data from another database system (that is, a BLOB type), use a STRING column to hold it."
There are a lot of use cases where it makes sense to only use CHAR instead of STRING, e.g. lets say you want to have a column that stores the two-letter country code (ISO_3166-1_alpha-2; e.g. US, ES, UK,...), here it makes more sense to use CHAR.