Support Questions

mustafa_qizilba · ‎09-13-2018

@Shu

We have headers coming in multiple sized like in lower and upper case. As we are converting CSV to Json, we are first converting the csv schema into Avro Schema where avro is not accepting upper case column name.

Kindly advice how to convert upper case into lower case.

Thanks in advance.

Shu_ashu · ‎09-14-2018

@Mustafa Ali Qizilbash

We can achieve this case in two ways using UpdateRecord Processor ,QueryRecord processor

Using UpdateRecord processor:

As your data having Header so use Csv reader use

Then use UpdateRecord processor with

Replacement Value Strategy

Record Path Value

Then swap the record path value, add new property as

/lowercase_filed_name

/upper_lowercase_field_name

Refer to this link for more details regards to UpdateRecord processor.

Configure the Record Writer avro schema registry matching with your new filed names(i.e lowercase filed names).

Then processor will swap the values from the record path and keep them to your new lowercase field names.

2.Using QueryRecord processor:

Add new property in QueryRecord processor and keep the query as

select col1,col2,col3.. from FLOWFILE

Then configure the Record Writer controller service avro schema matching with the select statement column names(case sensitive), Then processor will write the flowfile with new filed names.

For more details regards to convert record processor refer to this link.

You can use either of these methods choose which is better fit for your case.

I tried some simple flow using both processors, Use the attached flow xml as a reference and change per your case

222899-updaterecord-queryrecord.xml

View solution in original post

Shu_ashu · ‎09-13-2018

@Mustafa Ali Qizilbash

Could you add more details regards to Avro is not accepting Upper case column name means?
Some sample data will be useful to understand root cause of the issue..

mustafa_qizilba · ‎09-14-2018

@Shu

Data coming in ConvertRecord from CSV to AVRO schema.

Avro_Schema

Outgoing Data

As you can see, incoming data have column name in different format. I changed the Avro Schema to reflect the same column name format but then it loaded NULL values in all the columns.

Shu_ashu · ‎09-14-2018

@Mustafa Ali Qizilbash

We can achieve this case in two ways using UpdateRecord Processor ,QueryRecord processor

Using UpdateRecord processor:

As your data having Header so use Csv reader use

Then use UpdateRecord processor with

Replacement Value Strategy

Record Path Value

Then swap the record path value, add new property as

/lowercase_filed_name

/upper_lowercase_field_name

Refer to this link for more details regards to UpdateRecord processor.

Configure the Record Writer avro schema registry matching with your new filed names(i.e lowercase filed names).

Then processor will swap the values from the record path and keep them to your new lowercase field names.

2.Using QueryRecord processor:

Add new property in QueryRecord processor and keep the query as

select col1,col2,col3.. from FLOWFILE

Then configure the Record Writer controller service avro schema matching with the select statement column names(case sensitive), Then processor will write the flowfile with new filed names.

For more details regards to convert record processor refer to this link.

You can use either of these methods choose which is better fit for your case.

I tried some simple flow using both processors, Use the attached flow xml as a reference and change per your case

222899-updaterecord-queryrecord.xml

mustafa_qizilba · ‎09-14-2018

@Shu

UpdateRecord Error: Can not write to schema, schema is unknow.

UpdateRecord Properties

CSV Reader

CSV Writer

Shu_ashu · ‎09-14-2018

@Mustafa Ali Qizilbash

In your CSVSetWriter controller service configure as

in Schema Text Property define new schema i.e. avro schema with lowercase field names.

Let us know if you are having any issues..

mustafa_qizilba · ‎09-15-2018

@Shu

It got successful but out out was empty.

Source Header: Aim is to convert mentioned last 2 columns in lower case.

hw_3g_result_time,hw_3g_granularity_period,hw_3g_bsc_name,hw_3g_bsc_type,hw_3g_cell_name,hw_3g_cell_id,hw_3g_cell_index,hw_3g_reliability,VSRABAttEstabAMR,VSRABSuccEstabCSAMR

CSVWriter Properties

Avro Schema

UpdateRecord Property

Source Header: Aim is to convert mentioned last 2 columns in lower case.

hw_3g_result_time,hw_3g_granularity_period,hw_3g_bsc_name,hw_3g_bsc_type,hw_3g_cell_name,hw_3g_cell_id,hw_3g_cell_index,hw_3g_reliability,VSRABAttEstabAMR,VSRABSuccEstabCSAMR

Shu_ashu · ‎09-15-2018

@Mustafa Ali Qizilbash

Add change the added dynamic properties in update record processor as shown below:

/vsrabsuccestabcsamr

/VSRABSuccEstabCSAMR

/vsrabattestabamr

/VSRABAttEstabAMR

As we swapping the data to lower case, so keep the lowercase field names as property name and upper lower mix filed name as property value.

Then processor will add the field value to the newly added field names.

mustafa_qizilba · ‎09-15-2018

@Shu

pmresult-67109368-60-201804260000-201804260100-ori.zip (Please unzip, its a csv file)

sample-xml.xml

Same result :-(. Also attaching XML and Source data file for testing.

Empty Queue

Avro Schema:

{ "type":"record", "name":"jazz", "fields":[ {"name":"hw_3g_result_time", "type":["null","string"]}, {"name":"hw_3g_granularity_period", "type":["null","string"]}, {"name":"hw_3g_bsc_name", "type":["null","string"]}, {"name":"hw_3g_bsc_type", "type":["null","string"]}, {"name":"hw_3g_cell_name", "type":["null","string"]}, {"name":"hw_3g_cell_id", "type":["null","string"]}, {"name":"hw_3g_cell_index", "type":["null","string"]}, {"name":"hw_3g_reliability", "type":["null","string"]}, {"name":"vsrabattestabamr", "type":["null","string"]}, {"name":"vsrabsuccestabcsamr", "type":["null","string"]}] }

Shu_ashu · ‎09-15-2018

@Mustafa Ali Qizilbash

I tried with your UpdateRecord configs, not able to get your csv data(I used some sample data from your question screenshot) and the flow worked as expected.

InputData:

hw_3g_result_time,hw_3g_granularity_period,hw_3g_bsc_name,hw_3g_bsc_type,hw_3g_cell_name,hw_3g_cell_id,hw_3g_cell_index,hw_3g_reliability,VSRABAttEstabAMR,VSRABSuccEstabCSAMR
2018-04-26 0000,60,ULHR,BSC,UQRUR,48063,216,Reliable,33333,33
2018-04-26 0000,60,ULHR,BSC,UQRUR,48063,216,Reliable,77777,7

OutputData:

hw_3g_result_time,hw_3g_granularity_period,hw_3g_bsc_name,hw_3g_bsc_type,hw_3g_cell_name,hw_3g_cell_id,hw_3g_cell_index,hw_3g_reliability,vsrabattestabamr,vsrabsuccestabcsamr
2018-04-26 0000,60,ULHR,BSC,UQRUR,48063,216,Reliable,33333,33
2018-04-26 0000,60,ULHR,BSC,UQRUR,48063,216,Reliable,77777,7

I attached the sample flow template that I have tried update-record-222899.xml .Use the attached flow xml as a reference and change per your case.

Cloudera Community

Support Questions

NiFi: Convert data into lower case