Support Questions

Find answers, ask questions, and share your expertise

Calling Phoenix CSVCommonsLoader/CSVParser throws exception indicating timestamp exceeds max capacity for data type

Contributor

I'm using HDP 2.5.3.0 and calling the Phoenix library CSVCommonsLoader and CSVParser to parse and load a tab delimited CSV file. The correctly converts the data into an insert statement like UPSERT INTO SCHEMA.MYTABLE(ACFT, FILENAME, FLIGHT_TS, ....) but when it goes to insert the 1st record using values [123456, 123456_201410027_18327_000.tsv, 2014-10-27 18:22:22.000] it throws an exception java.sql.SSQLException: ERROR 206 (22003): The data excceds the max capacity for the data type. value='2014-10-27 18:22:22.000', columnName=FLIGHT_TS. The data types are coming from the phoenix table itself.

Has anyone else experienced this and found a work around? I downloaded the phoenix project using tag 4.7.0-HBase-1.1 but when I step through the code that parses the records to convert into the upsert stmt I'm obviously stepping through the wrong version of the source code. I'll certainly investigate more if I could find the right version of the source. According to what I see, the code I don't have right it the commons-csv-1.0. I did see from looking around in the code that there is a static string named TIMESTAMP_FORMAT_ATTRIB that is set to "phoenix.query.timestampFormat", but setting that had no effect.

Other team members that use the bulk load utility do not encounter this problem. But someone who is writing PSQL calling some Java code is getting this same problem.

3 REPLIES 3

Contributor

OK, just discovered one issue, I assumed that the /usr/hdp/2.5.3.0-37/phoenix/commons-csv-1.0.jar was the same version as the org.apache.commons.csv.CSVParser bundled in phoenix-4.7.0.2.5.3.0-37-client.jar. However, the sizes of the class files are fairly different between the 2 jar files. Might explain why when I step through the CSVParser code that I step right into the middle of some source code.

Contributor

Duh, looking in the phoenix pom.xml and I fid they are really using...commons-csv version 1.0. Ugh. But now I see that the Phoenix-4.7.0-HBase-1.1 pom.xml is actually for hbase version 1.1.3 and hadoop version 2.5.1 not hadoop version 2.5.3 which is what I'm running. I'll keep checking out releases. According to http://hortonworks.com/products/data-center/hdp/, HDP 2.5.3 should be using Phoenix 4.7.0 and HBase 1.1.2, so I'm a little lost on which version of Phoenix I should be grabbing.

Super Collaborator
@Jeff Watson

Have you tried to log the generated upsert statement and execute it manually using sqlline? As for the versions of libraries, you need to keep in mind that Phoenix in HDP has number of patches and features comparing to the Apache Phoenix 4.7 release. If you are building your application for HDP I would suggest to use the corresponding libraries (whether it's Phoenix or HBase or Hadoop) that are located at

http://nexus-private.hortonworks.com/nexus/content/groups/public

If you are using maven, I would suggest to add it to the repositories section in your pom.xml :

    <repository>
      <id>public</id>
      <name>hwx</name>
      <url>http://nexus-private.hortonworks.com/nexus/content/groups/public/</url>
    </repository>

and use the corresponding versions of phoenix/hbase/hadoop libraries. For example hadoop versions is supposed to be

2.7.3.2.5.3.0-37

And two words why CSV classes are different. Phoenix is using shading for most of the external libraries, so classes may be modified when added to the final jar (to reflect shaded imports). But it should not affect the debug information.