Vectorization is supported for char and varchar datatypes as per below link:
What about BIGINT and INT?
If I'm having a column with BIGINT as datatype and it's having a value in INT's range, then will it take all the 8bytes or just 4bytes for storing?,
what about BIGINT and INT?
for example if I'm having a column with BIGINT as datatype and I'm inserting value in INT's range, will it take 8byte(as for BIGINT) or just 4bytes considering it is in INT range?
Something to consider is the downstream of Hive uses as well. We used String initially (mainly because for the data we were loading, we weren't given the data types, so String allowed us to ignore it). However, when we started using SAS, those String fields all converted to varchar(32k) and caused headaches on that end. We converted to varchar(x).
Hi @Jeff Watson. You are correct about SAS use of String datatypes. Good catch! One of my customers also had to deal with this. String datatype conversions can perform very poorly in SAS.
With SAS/ACCESS to Hadoop you can set the libname option DBMAX_TEXT (added with SAS 9.4m1 release) to globally restrict the character length of all columns read into SAS.
However for restricting column size SAS does specifically recommends using the VARCHAR datatype in Hive whenever possible.
The following techniques can be used to work around the challenge in SAS, and they all work:
I hope this helps.