Created 11-27-2020 07:35 AM
I have some pipe delimited files. Each field is bounded by quotes like this.
"Created_Date__c"|"CreatedById"|"CreatedDate"|"Guid_c"
"2020-03-02 00:00:00"|"0053i000002XCpAAG"|"2020-03-02 16:01:34"|"94bf83ccf9daf610VgnVCM100000307882a2RCRD"
"2020-03-03 00:00:00"|"0053i000002XCpAAG"|"2020-03-03 09:15:56"|"1a4bb238cdedd610VgnVCM100000307882a2RCRD"
"2020-03-03 00:00:00"|"0053i000002XCpAAG"|"2020-03-03 09:52:33"|"22408baca6fee610VgnVCM100000307882a2RCRD"
I need to cleanse this data and the needs to look like this.
Created_Date__c|CreatedById|CreatedDate|Guid_c
2020-03-02 00:00:00|0053i000002XCpAAG|2020-03-02 16:01:34|94bf83ccf9daf610VgnVCM100000307882a2RCRD
2020-03-03 00:00:00|0053i000002XCpAAG|2020-03-03 09:15:56|1a4bb238cdedd610VgnVCM100000307882a2RCRD
2020-03-03 00:00:00|0053i000002XCpAAG|2020-03-03 09:52:33|22408baca6fee610VgnVCM100000307882a2RCRD
I tried using ReplaceText with these configurations.
search value - ^"(.*)"$ and Replacement Value - $1. But these configurations is not working and the file is routing to failure. not sure what might be the issue.
open to other suggestions. Thanks in advance.
Created on 11-28-2020 11:21 AM - edited 11-28-2020 11:23 AM
@naga_satish I think the solution you will find easiest is to use a CSV record reader and writer. In the reader you tell schema/settings it is pipe delimited, and escaped with quotes. In the writer you choose same schema no quotes. It should be straight forward use case for RecordReaders in UpdateRecord/QueryRecord/LookupRecord or other RecordReader based processors.
If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.
Thanks,
Steven
Created 11-30-2020 04:11 AM
@stevenmatison I already tried using convert record processor with the configurations that you mentioned. I use InferSchema as schema strategy. this configuration throughs an error. It says index for header "Created_Date__c" is 1 but csv record have only one value. Not sure what is this error.
When I set schema strategy as "use string fields from header" entire data is messed up.
When I set schema strategy as schema text everything is empty, just delimiters and header is present.
Created 11-30-2020 04:59 AM
You will need to define your schema in avro format, drop that into the readers/writers. Here is an example:
{ "type" : "record", "name" : "DailyCSV", "fields" : [ { "name" : "DepartmentName" , "type" : ["string", "null"] }, { "name" : "AccountName" , "type" : ["string", "null"] }, { "name" : "AccountOwnerId", "type" : ["string", "null"] }, { "name" : "AdditionalInfo", "type" : [ "null", "string" ] } ] }
Created 11-30-2020 06:43 AM
@stevenmatison That's precisely what I did. Every column in processed file is empty. just header name and delimiters are present.
Created 11-30-2020 08:51 AM
Hello,
Can you show the config of your CSV recordReader processor? Your CSV looks nice, there is no need to replace anything here.
In order to simplify the debug, you can also select "Infer schema" instead of using a Avro schema. It's of course better to work with Avro schema when you go to production
Created 11-30-2020 11:57 AM
As suggested above, update post with your processor, its reader and writier settings. It sounds like you have something misconfigured. If possible show us a screen shot of your flow too.