Support Questions

Find answers, ask questions, and share your expertise

Apache NiFi working with nulls/empty in date/datetime/timezone columns

avatar

hi there,

So I need your help with something. So I am extracting 4 columns out of a database. Out of these 4 columns, two columns are DATE in Oracle and I want to convert them into DATETIME (as they are going to be inserted into BigQuery). In addition to this, I am using an UpdateRecord to generate a new column and for that, I am using the following AVRO Schema:

{
"namespace": "example.avro",
"type": "record",
"name": "my_relevant_table",
"fields": [
{
"name": "ID_REQUEST",
"type": [
"string",
"null"
]
},
{
"name": "ID_PROCESS",
"type": [
"string",
"null"
]
},
{
"name": "ARCHIVE_DATE",
"type": [
"null",
{
"type": "long",
"logicalType": "local-timestamp-millis"
}
]
},
{
"name": "EXEC_DATE",
"type": [
"null",
{
"type": "long",
"logicalType": "local-timestamp-millis"
}
]
},{
"name": "LOAD_DATE",
"type": [
"null",
{
"type": "int",
"logicalType": "date"
}
]
},
]
}

Now, as you can imagine, I will encounter some null value within the ARCHIVE_DATE column, because this is a normal behavior. If I let the flow execute as it is, the AVRO File get's generated even with the NULL value. 

However, I would like to change something. If within that column I have a value, I would like to modify the value and assign it a Timezone ... as the actual value gets converted into UTC and I do not want that.

How can I achieve this using the UpdateRecord Processor?  I am pretty sure that I will have to use an ifElse statement but in terms of values, I have tried several things and all of the ended with an error:

${field.value:isNull():ifElse('0', ${field.value:toNumber():toDate('yyyy-MM-dd HH:mm:ss.SSS','Europe/Bucharest'):format('yyyy-MM-dd HH:mm:ss.SSS','Europe/Bucharest')})}

The thing is that if I use '0', it works and assigns the value 0 if the field value is empty or null (the column contains no value, no space, not even the string null). However, if I replace '0' with '' I receive an error message for NumberConversion. 

In this case, how can I make sure that the values from within that column remain null/empty if they are like that and otherwise, apply the Europe/Bucharest timezone for that specific field value.

5 REPLIES 5

avatar

@MattWho , @stevenmatison , @SAMSAL : might you have some insight for this matter? I have been struggling with it for the last couple of days and I do not really know what to try next. 

avatar

just to add some further infos: the value I am extracting from my Oracle database is  2012-05-21 23:59:35 and within the AVRO File (which reaches my UpdateRecord Processor) the value is 1337633975000. If we transform the value from the AVRO File, we will see that we are talking about 012-05-21 20:59:35 (UTC Timezone) and I need to as it is, in Europe/Bucharest.

I have tried ${field.value:isNull():ifElse('0', ${field.value:toDate():format('yyyy-MM-dd HH:mm:ss','Europe/Bucharest')})} but I get 

java.lang.NumberFormatException: For input string: "2012-05-21 23:59:35"

I have also tried ${field.value:isNull():ifElse('0', ${field.value:toDate('yyyy-MM-dd HH:mm:ss','Europe/Bucharest'):format('yyyy-MM-dd HH:mm:ss','Europe/Bucharest')})} but I get 

Cannot parse attribute value as a date; date format: yyyy-MM-dd HH:mm:ss; attribute value: 1337633975000


And the part where I keep the value the same as it is, if the original value is empty/null .... that I have not yet solved. 

avatar
Super Guru

Hi @cotopaul ,

I'm not sure if this will help but the reason you are getting the error:

java.lang.NumberFormatException: For input string: "2012-05-21 23:59:35"

is because the expression language you provided will generate formatted string date but the avro schema expects long, so is it possible to change EL where you add toNumber() at the end of toDate conversion, or change the data type in the avro schema to string if that works. Another thing to avoid all this confusion of having to worry about timezone and null values , is it possible to reformat the date fields and convert to long before you pull out of Oracle.

Hope that helps in any way.

 

 

avatar

@SAMSAL thank you for your response 🙂 The thing is that I 100% require the value to be extracted and exported as long with logical type = local-timestamp-millis, as this is the only format which gets inserted automatically in a DATETIME column within BigQuery.... meaning that using a string is not acceptable in any way.

After exporting the data out of the database, the column looks as follows within the AVRO File. This value get transformed into UTC automatically, due to the AVRO Reader and AVRO Writer. The AVRO Writer contained the schema long with local-timestamp-millis when extracting the value.

 

datafromavro.png

 However, if I open the AVRO File anywhere else but NiFi, I get the value 1337633975000, which is ok. Now, I am trying to transform it within Europe/Bucharest timezone, as I require it like this. 

I have tried using ${field.value:isEmpty():ifElse(${field.value}, ${field.value:toDate():format('yyyy-MM-dd HH:mm:ss','Europe/Bucharest'):toNumber()})} as well, but without success as toNumber will somehow get "" as input :(. 

I also tried extracting the date from the DB directly as long .... the value gets transformed directly into UTC. I have no idea what to try next and how to proceed further. 

avatar
Super Guru

I know this sounds like batching the problem but after the UpdateRecord can you do JoltTransformRecord to transform "" to null for the target field? here is an example of spec that can do such thing:

https://github.com/bazaarvoice/jolt/issues/667