Support Questions

Find answers, ask questions, and share your expertise

Is the Impala String column/datatype REALLY limited to 32,767 characters?

avatar
New Contributor

The Impala documentation lists a maximum size of 32,767 for the String datatype, but this does not appear to be enforced. I can successfully insert and retrieve String column values in excess of 1,300,000 characters using the Hive JDBC driver. Also, Impala built-in String functions such as length() and strLeft() perform properly with these very large values. I have successfully utilized these large columns using both CSV and Parquet storage formats.

 

Are there specific functions or features which manifest this limitation? So far I have not seen any other questions or comments related to this particular topic.

 

Thanks,
 -sid


My configuration:
Impala Server version: 2.1.2-cdh5
Hive JDBC driver version: 0.13.1-cdh5.3.2
Java version 1.8.0


From http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/impala_string.html :
"Length: Maximum of 32,767 bytes. Do not use any length constraint when declaring STRING columns, as you might be familiar with from VARCHAR, CHAR, or similar column types from relational database systems. If you do need to manipulate string values with precise or maximum lengths, in Impala 2.0 and higher you can declare columns as VARCHAR(max_length) or CHAR(length), but for best performance use STRING where practical."

3 REPLIES 3

avatar
New Contributor

avatar
Contributor

I believe the 32,767 limit always only applied in certain situations. For example, string literals or passing string arguments to certain functions. It might have always worked to read >32K strings out of data files, but what would happen afterwards was not guaranteed. I'll check which kinds of limitations were lifted.

 

John

avatar
Cloudera Employee

Documentation is not clear about this. Defect IMPALA-5740 will address this question.

 

https://issues.apache.org/jira/browse/IMPALA-5740

 

Thanks,

Luis Martinez.