Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to extract fields enveloped in quotes in NiFi?

avatar
Explorer

I have some pipe delimited files. Each field is bounded by quotes like this.

"Created_Date__c"|"CreatedById"|"CreatedDate"|"Guid_c"
"2020-03-02 00:00:00"|"0053i000002XCpAAG"|"2020-03-02 16:01:34"|"94bf83ccf9daf610VgnVCM100000307882a2RCRD"
"2020-03-03 00:00:00"|"0053i000002XCpAAG"|"2020-03-03 09:15:56"|"1a4bb238cdedd610VgnVCM100000307882a2RCRD"
"2020-03-03 00:00:00"|"0053i000002XCpAAG"|"2020-03-03 09:52:33"|"22408baca6fee610VgnVCM100000307882a2RCRD"

I need to cleanse this data and the needs to look like this.


Created_Date__c|CreatedById|CreatedDate|Guid_c
2020-03-02 00:00:00|0053i000002XCpAAG|2020-03-02 16:01:34|94bf83ccf9daf610VgnVCM100000307882a2RCRD
2020-03-03 00:00:00|0053i000002XCpAAG|2020-03-03 09:15:56|1a4bb238cdedd610VgnVCM100000307882a2RCRD
2020-03-03 00:00:00|0053i000002XCpAAG|2020-03-03 09:52:33|22408baca6fee610VgnVCM100000307882a2RCRD

 

I tried using ReplaceText with these configurations.
search value - ^"(.*)"$ and Replacement Value - $1. But these configurations is not working and the file is routing to failure. not sure what might be the issue.

open to other suggestions. Thanks in advance.

6 REPLIES 6

avatar
Super Guru

@naga_satish  I think the solution you will find easiest is to use a CSV record reader and writer.  In the reader you tell schema/settings it is pipe delimited, and escaped with quotes.  In the writer you choose same schema no quotes.  It should be straight forward use case for RecordReaders in UpdateRecord/QueryRecord/LookupRecord or other RecordReader based processors.

 

 

If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.  

 

Thanks,


Steven

avatar
Explorer

@stevenmatison I already tried using convert record processor with the configurations that you mentioned. I use InferSchema as schema strategy. this configuration throughs an error. It says index for header "Created_Date__c" is 1 but csv record have only one value. Not sure what is this error.

 

When I set schema strategy as "use string fields from header" entire data is messed up.

 

When I set schema strategy as schema text everything is empty, just delimiters and header is present.

 

avatar
Super Guru

You will need to define your schema in avro format, drop that into the readers/writers.   Here is an example:

 

{
   "type" : "record",
   "name" : "DailyCSV",
   "fields" : [
      { "name" : "DepartmentName" , "type" : ["string", "null"] },
	  { "name" : "AccountName" , "type" : ["string", "null"] },
	  { "name" : "AccountOwnerId", "type" : ["string", "null"] },
      { "name" : "AdditionalInfo", "type" : [ "null", "string" ] }
]
}

 

avatar
Explorer

@stevenmatison That's precisely what I did. Every column in processed file is empty. just header name and delimiters are present.

avatar
Contributor

Hello,

 

Can you show the config of your CSV recordReader processor? Your CSV looks nice, there is no need to replace anything here.

In order to simplify the debug, you can also select "Infer schema" instead of using a Avro schema. It's of course better to work with Avro schema when you go to production

avatar
Super Guru

As suggested above, update post with your processor, its reader and writier settings.  It sounds like you have something misconfigured.  If possible show us a screen shot of your flow too.