Support Questions

Find answers, ask questions, and share your expertise

Conversion from STRING to DATE - CSV to AVRO conversion

avatar

Hi guys,

So I have been struggling with a data conversion and I can't really figure out how to achieve what I am trying to achieve.
I have a CSV File which comes into my flow as follows:

COLUMN1 COLUMN2 COLUMN3 COLUMN4 COLUMN5 COLUMN6 MONTH
0182 Tel set W27 0 2200 31-IAN-22
0183 Apa cai W27 0 2200 30-SEP-22
0185 HDM set MT8 1 2200 28-FEB-22
0185 Apa alo MT8 0 2200 31-OCT-22
0186 HDM set HG5 1 2200 30-IUN-22
0188 Fus alo HG5 1 2200 30-APR-22

I am using afterwards an ConvertRecord to transform the CSV into an AVRO File, using the following schema:

{
"type": "record",
"name": "nifiRecord",
"namespace": "org.apache.nifi",
"fields": [
{
"name": "COLUMN1",
"type": [
"string",
"null"
]
},
{
"name": "COLUMN2",
"type": [
"string",
"null"
]
},
{
"name": "COLUMN3",
"type": [
"string",
"null"
]
},
{
"name": "COLUMN4",
"type": [
"string",
"null"
]
},
{
"name": "COLUMN5",
"type": [
"string",
"null"
]
},
{
"name": "COLUMN6",
"type": [
"int",
"null"
]
},
{
"name": "MONTH",
"type": [
"string",
"null"
]
}
]
}

Now, in the next step, I would like to transform the MONTH from STRING into DATE, so I could insert this value into a BigQuery Table (target column is DATE).
For that, I am using an UpdateRecord Processor in which I tried several NiFi Expression Language tests, but neither work for me.
Basically, the schema will have to change into:

{
"name": "MONTH",
"type": [
"null",
{
"type": "int",
"logicalType": "date"
}
]
}

Unfortunately, when trying to convert that string date into a normal date, i keep on encountering strange errors.
What I am trying to have is 31-IAN-22 as 31-JAN-22 (or 31-01-22) in the generated AVRO File as an INT-DATE.
As you can see, the month itself is not necessary written in english.
I have tried several ELs:

${field.value:replace('SEP', '09'):toDate('dd-MM-yy'):format('dd-MMM-yy'):toDate('dd-MMM-yy'):toNumber()}
${field.value:toDate('dd-MM-yy'):format('dd-MMM-yy'):toDate('dd-MMM-yy'):toNumber()}
${field.value:toDate('dd-MM-yy'):format('dd-MMM-yy'):toNumber()}
${field.value:toDate('dd-MM-yy'):toNumber()}

Every time I receives some strange errors:

org.apache.nifi.attribute.expression.language.exception.IllegalAttributeException: Cannot parse attribute value as a date; date format: yyyy-mm-dd; attribute value: 30-SEP-23
java.lang.NullPointerException: null
org.apache.nifi.serialization.record.util.IllegalTypeConversionException: Failed Conversion of Field [MONTH] from String [1696021200000] to LocalDate


Does anybody know how I could achieve this?

Thanks 🙂

1 ACCEPTED SOLUTION

avatar

I managed to solve my issue using UpdateRecord with Literal Value replacement strategy. I have defined the following EL:

 ${field.value:toDate('dd-MMM-yy'):format('yyyy-MM-dd')}


On the other hand, the avro schema remained the same, "type": "int" and "logicalType": "date".


As for replacing the month letters, I have hard coded a very ugly IF-ELSE statement:

${field.value:contains('IAN'):ifElse(${field.value:replace('IAN','JAN')},${field.value:contains('IUN'):ifElse(${field.value:replace('IUN','JUN')},${field.value:contains('IUL'):ifElse(${field.value:replace('IUL','JUL')},${field.value})})}):toDate('dd-MMM-yy'):format('yyyy-MM-dd')}


PS: this link helped me a lot: https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html

View solution in original post

1 REPLY 1

avatar

I managed to solve my issue using UpdateRecord with Literal Value replacement strategy. I have defined the following EL:

 ${field.value:toDate('dd-MMM-yy'):format('yyyy-MM-dd')}


On the other hand, the avro schema remained the same, "type": "int" and "logicalType": "date".


As for replacing the month letters, I have hard coded a very ugly IF-ELSE statement:

${field.value:contains('IAN'):ifElse(${field.value:replace('IAN','JAN')},${field.value:contains('IUN'):ifElse(${field.value:replace('IUN','JUN')},${field.value:contains('IUL'):ifElse(${field.value:replace('IUL','JUL')},${field.value})})}):toDate('dd-MMM-yy'):format('yyyy-MM-dd')}


PS: this link helped me a lot: https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html