The Impala documentation lists a maximum size of 32,767 for the String datatype, but this does not appear to be enforced. I can successfully insert and retrieve String column values in excess of 1,300,000 characters using the Hive JDBC driver. Also, Impala built-in String functions such as length() and strLeft() perform properly with these very large values. I have successfully utilized these large columns using both CSV and Parquet storage formats. Are there specific functions or features which manifest this limitation? So far I have not seen any other questions or comments related to this particular topic. Thanks, -sid My configuration: Impala Server version: 2.1.2-cdh5 Hive JDBC driver version: 0.13.1-cdh5.3.2 Java version 1.8.0 From http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/impala_string.html : "Length: Maximum of 32,767 bytes. Do not use any length constraint when declaring STRING columns, as you might be familiar with from VARCHAR, CHAR, or similar column types from relational database systems. If you do need to manipulate string values with precise or maximum lengths, in Impala 2.0 and higher you can declare columns as VARCHAR(max_length) or CHAR(length), but for best performance use STRING where practical."
... View more