The Impala documentation lists a maximum size of 32,767 for the String datatype, but this does not appear to be enforced. I can successfully insert and retrieve String column values in excess of 1,300,000 characters using the Hive JDBC driver. Also, Impala built-in String functions such as length() and strLeft() perform properly with these very large values. I have successfully utilized these large columns using both CSV and Parquet storage formats.
Are there specific functions or features which manifest this limitation? So far I have not seen any other questions or comments related to this particular topic.
Impala Server version: 2.1.2-cdh5
Hive JDBC driver version: 0.13.1-cdh5.3.2
Java version 1.8.0
From http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/impala_string.html :
"Length: Maximum of 32,767 bytes. Do not use any length constraint when declaring STRING columns, as you might be familiar with from VARCHAR, CHAR, or similar column types from relational database systems. If you do need to manipulate string values with precise or maximum lengths, in Impala 2.0 and higher you can declare columns as VARCHAR(max_length) or CHAR(length), but for best performance use STRING where practical."
I believe the 32,767 limit always only applied in certain situations. For example, string literals or passing string arguments to certain functions. It might have always worked to read >32K strings out of data files, but what would happen afterwards was not guaranteed. I'll check which kinds of limitations were lifted.