Created 02-07-2018 05:45 PM
Hi, guys,
So I have an incoming FlowFile with content text delimited by pipes ('|'), and I want to send this information to several destinations. To convert it to JSON, for example, I know I can use the AttributesToJSON processor, but how exactly can I access the FlowFile content and convert them to attributes?
e.g.
original FlowFile content:
1234567891285|37797|1| the brown fox
FlowFile attributes (after converting):
id = 1234567891285
sequence = 37797
category = 1
text = the brown fox
... and after that I could use AttributesToJSON to generate my JSON file.
Any ideas on how to achieve this?
Thanks in advance!
Cheers.
Created 02-07-2018 05:59 PM
You don't have to extract the fields to attributes if you are converting the contents to a different format, instead you can use ConvertRecord with a CSVReader with custom format (a pipe delimiter for instance) and name your fields in the Avro schema. Then in ConvertRecord you can set a JsonRecordSetWriter to convert to JSON. This same approach will work for any supported output format, or you can even write your own ScriptedRecordSetWriter if you need a custom format.
If you do need to extract to attributes, you can use ExtractText with a regular expression that matches each field, and you can add user-defined properties to extract the group(s) into their associated attributes (the property name is the field name such as "id" or "sequence", and the value is the grouping expression, perhaps $2, $3, etc.)
Created 02-07-2018 05:50 PM
The cleaniest way should be to use ConvertRecord processor with a CSVReader (using Delimiter as pipe) and JSonSetRecordWriter.
This directly convert your CSV into JSON without passing by attributes. Using Record processors also gives you better performance.
Thanks
Created 02-07-2018 08:08 PM
Thank you! Could you please provide more details on how to use the schema registry? I'm having some trouble with that.
Created 02-07-2018 05:59 PM
You don't have to extract the fields to attributes if you are converting the contents to a different format, instead you can use ConvertRecord with a CSVReader with custom format (a pipe delimiter for instance) and name your fields in the Avro schema. Then in ConvertRecord you can set a JsonRecordSetWriter to convert to JSON. This same approach will work for any supported output format, or you can even write your own ScriptedRecordSetWriter if you need a custom format.
If you do need to extract to attributes, you can use ExtractText with a regular expression that matches each field, and you can add user-defined properties to extract the group(s) into their associated attributes (the property name is the field name such as "id" or "sequence", and the value is the grouping expression, perhaps $2, $3, etc.)
Created 02-07-2018 08:06 PM
I've never used the Avro Schema before. Could you please explain how to name the fields in it? I checked the documentation, but it's a little bit confusing.
Thanks in advance!
Created 02-07-2018 08:19 PM
Avro Schemas can be confusing the first couple of times you create them 🙂 In your case you could use the following:
{ "namespace": "nifi", "name": "cesarPipeDelimitedRecord", "type": "record", "fields": [ {"name": "id","type": "string"}, {"name": "sequence","type": "int"}, {"name": "category","type": "int"}, {"name": "text","type": "string"} ] }
If you can have missing values, then you can replace the type with a union, for example if "category" can be missing, then its field entry can be
{"name": "category","type": ["null","int"]},
Created 02-14-2018 06:17 PM
Thanks, @Matt Burgess! This helped a lot 😉
Created 07-27-2018 04:41 PM
Hi @Matt Burgess How about if just wanna keep the content as Attribute? Like my scenario is that I want a user to give parameters through a csv which I can parse and use them attribute , for example like User wants to import a tbale , he will write table in csv and that I'll use as an attribite in flowfile.
My present approach is :
ListFile->FetchFile->SplitFile->extractText->updateAttribute. But doesn't seems to be wroking out. Any suggestions?
Created 07-27-2018 05:53 PM
What issues are you having? That flow description seems like it should work. Perhaps your regular expression or other config of ExtractText needs tweaking?