Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

[Nifi] Converting a delimited FlowFile's content to attributes

avatar

Hi, guys,

So I have an incoming FlowFile with content text delimited by pipes ('|'), and I want to send this information to several destinations. To convert it to JSON, for example, I know I can use the AttributesToJSON processor, but how exactly can I access the FlowFile content and convert them to attributes?


e.g.

original FlowFile content:

1234567891285|37797|1| the brown fox
FlowFile attributes (after converting):

id = 1234567891285
sequence = 37797
category = 1
text = the brown fox
... and after that I could use AttributesToJSON to generate my JSON file.

Any ideas on how to achieve this?

Thanks in advance!

Cheers.

1 ACCEPTED SOLUTION

avatar
Master Guru

You don't have to extract the fields to attributes if you are converting the contents to a different format, instead you can use ConvertRecord with a CSVReader with custom format (a pipe delimiter for instance) and name your fields in the Avro schema. Then in ConvertRecord you can set a JsonRecordSetWriter to convert to JSON. This same approach will work for any supported output format, or you can even write your own ScriptedRecordSetWriter if you need a custom format.

If you do need to extract to attributes, you can use ExtractText with a regular expression that matches each field, and you can add user-defined properties to extract the group(s) into their associated attributes (the property name is the field name such as "id" or "sequence", and the value is the grouping expression, perhaps $2, $3, etc.)

View solution in original post

8 REPLIES 8

avatar

Hi @Cesar Rodrigues

The cleaniest way should be to use ConvertRecord processor with a CSVReader (using Delimiter as pipe) and JSonSetRecordWriter.

This directly convert your CSV into JSON without passing by attributes. Using Record processors also gives you better performance.

Thanks

avatar

@Abdelkrim Hadjidj

Thank you! Could you please provide more details on how to use the schema registry? I'm having some trouble with that.

avatar
Master Guru

You don't have to extract the fields to attributes if you are converting the contents to a different format, instead you can use ConvertRecord with a CSVReader with custom format (a pipe delimiter for instance) and name your fields in the Avro schema. Then in ConvertRecord you can set a JsonRecordSetWriter to convert to JSON. This same approach will work for any supported output format, or you can even write your own ScriptedRecordSetWriter if you need a custom format.

If you do need to extract to attributes, you can use ExtractText with a regular expression that matches each field, and you can add user-defined properties to extract the group(s) into their associated attributes (the property name is the field name such as "id" or "sequence", and the value is the grouping expression, perhaps $2, $3, etc.)

avatar
@Matt Burgess

I've never used the Avro Schema before. Could you please explain how to name the fields in it? I checked the documentation, but it's a little bit confusing.

Thanks in advance!

avatar
Master Guru

Avro Schemas can be confusing the first couple of times you create them 🙂 In your case you could use the following:

{
 "namespace": "nifi",
 "name": "cesarPipeDelimitedRecord",
 "type": "record",
 "fields": [
  {"name": "id","type": "string"},
  {"name": "sequence","type": "int"},
  {"name": "category","type": "int"},
  {"name": "text","type": "string"}
]
}

If you can have missing values, then you can replace the type with a union, for example if "category" can be missing, then its field entry can be

{"name": "category","type": ["null","int"]},

avatar

Thanks, @Matt Burgess! This helped a lot 😉

avatar
Contributor

Hi @Matt Burgess How about if just wanna keep the content as Attribute? Like my scenario is that I want a user to give parameters through a csv which I can parse and use them attribute , for example like User wants to import a tbale , he will write table in csv and that I'll use as an attribite in flowfile.

My present approach is :

ListFile->FetchFile->SplitFile->extractText->updateAttribute. But doesn't seems to be wroking out. Any suggestions?

avatar
Master Guru

What issues are you having? That flow description seems like it should work. Perhaps your regular expression or other config of ExtractText needs tweaking?