Support Questions

Find answers, ask questions, and share your expertise

NiFi: Convert data into lower case

avatar
Rising Star
@Shu

We have headers coming in multiple sized like in lower and upper case. As we are converting CSV to Json, we are first converting the csv schema into Avro Schema where avro is not accepting upper case column name.

Kindly advice how to convert upper case into lower case.

Thanks in advance.

1 ACCEPTED SOLUTION

avatar
Master Guru

@Mustafa Ali Qizilbash

We can achieve this case in two ways using UpdateRecord Processor ,QueryRecord processor

Using UpdateRecord processor:

As your data having Header so use Csv reader use

91433-csvreader.png

Then use UpdateRecord processor with

Replacement Value Strategy

Record Path Value

Then swap the record path value, add new property as

/lowercase_filed_name

/upper_lowercase_field_name

Refer to this link for more details regards to UpdateRecord processor.

Configure the Record Writer avro schema registry matching with your new filed names(i.e lowercase filed names).

Then processor will swap the values from the record path and keep them to your new lowercase field names.

2.Using QueryRecord processor:

Add new property in QueryRecord processor and keep the query as

select col1,col2,col3.. from FLOWFILE

Then configure the Record Writer controller service avro schema matching with the select statement column names(case sensitive), Then processor will write the flowfile with new filed names.

For more details regards to convert record processor refer to this link.

You can use either of these methods choose which is better fit for your case.

I tried some simple flow using both processors, Use the attached flow xml as a reference and change per your case

91435-flow.png

222899-updaterecord-queryrecord.xml

View solution in original post

9 REPLIES 9

avatar
Master Guru

@Mustafa Ali Qizilbash

Could you add more details regards to Avro is not accepting Upper case column name means?
Some sample data will be useful to understand root cause of the issue..

avatar
Rising Star
@Shu

Data coming in ConvertRecord from CSV to AVRO schema.

91430-incomingdata.png

Avro_Schema

91431-acvo-schema.png

Outgoing Data

91432-outgoingdata.png

As you can see, incoming data have column name in different format. I changed the Avro Schema to reflect the same column name format but then it loaded NULL values in all the columns.

avatar
Master Guru

@Mustafa Ali Qizilbash

We can achieve this case in two ways using UpdateRecord Processor ,QueryRecord processor

Using UpdateRecord processor:

As your data having Header so use Csv reader use

91433-csvreader.png

Then use UpdateRecord processor with

Replacement Value Strategy

Record Path Value

Then swap the record path value, add new property as

/lowercase_filed_name

/upper_lowercase_field_name

Refer to this link for more details regards to UpdateRecord processor.

Configure the Record Writer avro schema registry matching with your new filed names(i.e lowercase filed names).

Then processor will swap the values from the record path and keep them to your new lowercase field names.

2.Using QueryRecord processor:

Add new property in QueryRecord processor and keep the query as

select col1,col2,col3.. from FLOWFILE

Then configure the Record Writer controller service avro schema matching with the select statement column names(case sensitive), Then processor will write the flowfile with new filed names.

For more details regards to convert record processor refer to this link.

You can use either of these methods choose which is better fit for your case.

I tried some simple flow using both processors, Use the attached flow xml as a reference and change per your case

91435-flow.png

222899-updaterecord-queryrecord.xml

avatar
Rising Star

@Shu

UpdateRecord Error: Can not write to schema, schema is unknow.

91440-updaterecord-error.png

UpdateRecord Properties

91441-updaterecord-properties.png

CSV Reader

91442-csv-reader.png

CSV Writer

91443-csv-writer.png

avatar
Master Guru
@Mustafa Ali Qizilbash

In your CSVSetWriter controller service configure as

92452-csvsetwriter.png

in Schema Text Property define new schema i.e. avro schema with lowercase field names.

Let us know if you are having any issues..

avatar
Rising Star
@Shu

It got successful but out out was empty.

Source Header: Aim is to convert mentioned last 2 columns in lower case.

hw_3g_result_time,hw_3g_granularity_period,hw_3g_bsc_name,hw_3g_bsc_type,hw_3g_cell_name,hw_3g_cell_id,hw_3g_cell_index,hw_3g_reliability,VSRABAttEstabAMR,VSRABSuccEstabCSAMR

CSVWriter Properties

91447-csvwriter-properties.png

Avro Schema

91448-avroschemaregistry.png

UpdateRecord Property

Source Header: Aim is to convert mentioned last 2 columns in lower case.

hw_3g_result_time,hw_3g_granularity_period,hw_3g_bsc_name,hw_3g_bsc_type,hw_3g_cell_name,hw_3g_cell_id,hw_3g_cell_index,hw_3g_reliability,VSRABAttEstabAMR,VSRABSuccEstabCSAMR

91449-updaterecord.png

avatar
Master Guru

@Mustafa Ali Qizilbash

Add change the added dynamic properties in update record processor as shown below:

92454-screen-shot-2018-09-15-at-74320-am.png

/vsrabsuccestabcsamr

/VSRABSuccEstabCSAMR

/vsrabattestabamr

/VSRABAttEstabAMR

As we swapping the data to lower case, so keep the lowercase field names as property name and upper lower mix filed name as property value.

Then processor will add the field value to the newly added field names.

avatar
Rising Star

@Shu

pmresult-67109368-60-201804260000-201804260100-ori.zip (Please unzip, its a csv file)

sample-xml.xml

Same result :-(. Also attaching XML and Source data file for testing.

91450-updaterecords-properties.png

Empty Queue

91451-empty-queue.png

Avro Schema:

{ "type":"record", "name":"jazz", "fields":[ {"name":"hw_3g_result_time", "type":["null","string"]}, {"name":"hw_3g_granularity_period", "type":["null","string"]}, {"name":"hw_3g_bsc_name", "type":["null","string"]}, {"name":"hw_3g_bsc_type", "type":["null","string"]}, {"name":"hw_3g_cell_name", "type":["null","string"]}, {"name":"hw_3g_cell_id", "type":["null","string"]}, {"name":"hw_3g_cell_index", "type":["null","string"]}, {"name":"hw_3g_reliability", "type":["null","string"]}, {"name":"vsrabattestabamr", "type":["null","string"]}, {"name":"vsrabsuccestabcsamr", "type":["null","string"]}] }

avatar
Master Guru

@Mustafa Ali Qizilbash

I tried with your UpdateRecord configs, not able to get your csv data(I used some sample data from your question screenshot) and the flow worked as expected.

InputData:

hw_3g_result_time,hw_3g_granularity_period,hw_3g_bsc_name,hw_3g_bsc_type,hw_3g_cell_name,hw_3g_cell_id,hw_3g_cell_index,hw_3g_reliability,VSRABAttEstabAMR,VSRABSuccEstabCSAMR
2018-04-26 0000,60,ULHR,BSC,UQRUR,48063,216,Reliable,33333,33
2018-04-26 0000,60,ULHR,BSC,UQRUR,48063,216,Reliable,77777,7

OutputData:

hw_3g_result_time,hw_3g_granularity_period,hw_3g_bsc_name,hw_3g_bsc_type,hw_3g_cell_name,hw_3g_cell_id,hw_3g_cell_index,hw_3g_reliability,vsrabattestabamr,vsrabsuccestabcsamr
2018-04-26 0000,60,ULHR,BSC,UQRUR,48063,216,Reliable,33333,33
2018-04-26 0000,60,ULHR,BSC,UQRUR,48063,216,Reliable,77777,7

I attached the sample flow template that I have tried update-record-222899.xml .Use the attached flow xml as a reference and change per your case.