Support Questions

Find answers, ask questions, and share your expertise

Nifi - ExecuteSQL processor changing data types to string

avatar

ExecuteSQL processor currently generates data in Avro format. While fetching data from Oracle, generated Avro format currently converts any data type to string format. Is there a way to ensure that it retains data type of original column or an equivalent type ?

1 ACCEPTED SOLUTION

avatar
Master Guru

It should be retaining other types besides strings... What types are your columns that are being converted to strings?

It looks like it should be retaining BINARY, VARBINARY, LONGVARBINARY, ARRAY, BLOB, CLOB, byte, number, and boolean.

Big Integer and Big Decimal are converted to strings because Avro doesn't have a numeric type that supports them.

For reference the code that performs the conversion is here:

https://github.com/apache/nifi/blob/e4b7e47836edf47042973e604005058c28eed23b/nifi-nar-bundles/nifi-s...

View solution in original post

8 REPLIES 8

avatar
Master Guru

It should be retaining other types besides strings... What types are your columns that are being converted to strings?

It looks like it should be retaining BINARY, VARBINARY, LONGVARBINARY, ARRAY, BLOB, CLOB, byte, number, and boolean.

Big Integer and Big Decimal are converted to strings because Avro doesn't have a numeric type that supports them.

For reference the code that performs the conversion is here:

https://github.com/apache/nifi/blob/e4b7e47836edf47042973e604005058c28eed23b/nifi-nar-bundles/nifi-s...

avatar

@Bryan Bende Thanks Bryan. I double checked all the details. My ORACLE table has NUMBER columns. Here are the settings that I am using to establish DBCPConnectionPool. But my Avro data has all NUMBER columns coming as strings. Is there any way to look at details of how NiFi is generating Avro fields formats ?

oracleconnection.jpg

avatar
Master Guru

I'm wondering if it is related to the oracle driver... the code in NiFi that I linked to above shows how we convert the fields to Avro. It starts by get the value from the ResultSet like this:

final Object value = rs.getObject(i);

Then it checks what type value is, in the case of a number it does this:

else if (value instanceof Number || value instanceof Boolean) {                        
	rec.put(i - 1, value);
}

So if the Oracle driver returns any sub-class of Number, then that is what is given to Avro.

I think a good test would be to write a simple Java program using JDBC and the same driver you are using with NiFI, and query your table and see what getObject(i) returns from the ResultSet for your number columns. If it doesn't return Numbers then that is the problem, if it does then something else is happening after that.

avatar
Master Guru

Actually, was just reading this:

https://docs.oracle.com/cd/B19306_01/java.102/b14188/datamap.htm

One of the rows says:

DEC, DECIMAL, NUMBER, NUMERICoracle.sql.NUMBERjava.math.BigDecimal

So if that is true then Oracle is returning BigDecimal for your NUMBER columns, and then NiFi's code is turning BigDecimal and BigInteger into Strings.

Based on this, it looks like there may be a way to represent BigDecimal in Avro:

https://issues.apache.org/jira/browse/AVRO-1402

https://avro.apache.org/docs/1.7.7/spec.html#Logical+Types

We should look into using the decimal logical type in NiFi.

avatar
Master Guru

avatar

Thank You @Bryan Bende for creating JIRA for this issue. Are there any workarounds to this, till this issue gets resolved in a new release ? I can only think of creating a Hive table with appropriate column types, then writing a select query to transform to correct data type and inserting in new table. But this requires a post process and breaks real time ingestion.

avatar
Master Guru

@Shishir Saxena Unfortunately I can't think of too many good work arounds using out of the box functionality, but I guess it depends what you are doing with the Avro after ExecuteSQL, are you landing it in HDFS? or are you converting it to some other format and doing something else?

avatar
Expert Contributor

It looks like the Jira is still out there for this issue. Two questions: 1) Where is this on the roadmap to get fixed? and 2) Does this affect all Oracle datatypes, or just the numerics that convert to Java BigDecimal?