Reply
sid
New Contributor
Posts: 2
Registered: ‎08-19-2015

Is the Impala String column/datatype REALLY limited to 32,767 characters?

The Impala documentation lists a maximum size of 32,767 for the String datatype, but this does not appear to be enforced. I can successfully insert and retrieve String column values in excess of 1,300,000 characters using the Hive JDBC driver. Also, Impala built-in String functions such as length() and strLeft() perform properly with these very large values. I have successfully utilized these large columns using both CSV and Parquet storage formats.

 

Are there specific functions or features which manifest this limitation? So far I have not seen any other questions or comments related to this particular topic.

 

Thanks,
 -sid


My configuration:
Impala Server version: 2.1.2-cdh5
Hive JDBC driver version: 0.13.1-cdh5.3.2
Java version 1.8.0


From http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/impala_string.html :
"Length: Maximum of 32,767 bytes. Do not use any length constraint when declaring STRING columns, as you might be familiar with from VARCHAR, CHAR, or similar column types from relational database systems. If you do need to manipulate string values with precise or maximum lengths, in Impala 2.0 and higher you can declare columns as VARCHAR(max_length) or CHAR(length), but for best performance use STRING where practical."

sid
New Contributor
Posts: 2
Registered: ‎08-19-2015

Re: Is the Impala String column/datatype REALLY limited to 32,767 characters?

Cloudera Employee
Posts: 20
Registered: ‎09-11-2013

Re: Is the Impala String column/datatype REALLY limited to 32,767 characters?

I believe the 32,767 limit always only applied in certain situations. For example, string literals or passing string arguments to certain functions. It might have always worked to read >32K strings out of data files, but what would happen afterwards was not guaranteed. I'll check which kinds of limitations were lifted.

 

John

Highlighted
Cloudera Employee
Posts: 2
Registered: ‎05-17-2017

Re: Is the Impala String column/datatype REALLY limited to 32,767 characters?

Documentation is not clear about this. Defect IMPALA-5740 will address this question.

 

https://issues.apache.org/jira/browse/IMPALA-5740

 

Thanks,

Luis Martinez.