Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

[Nifi] Converting a delimited FlowFile's content to attributes

Solved Go to solution
Highlighted

[Nifi] Converting a delimited FlowFile's content to attributes

New Contributor

Hi, guys,

So I have an incoming FlowFile with content text delimited by pipes ('|'), and I want to send this information to several destinations. To convert it to JSON, for example, I know I can use the AttributesToJSON processor, but how exactly can I access the FlowFile content and convert them to attributes?


e.g.

original FlowFile content:

1234567891285|37797|1| the brown fox
FlowFile attributes (after converting):

id = 1234567891285
sequence = 37797
category = 1
text = the brown fox
... and after that I could use AttributesToJSON to generate my JSON file.

Any ideas on how to achieve this?

Thanks in advance!

Cheers.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: [Nifi] Converting a delimited FlowFile's content to attributes

You don't have to extract the fields to attributes if you are converting the contents to a different format, instead you can use ConvertRecord with a CSVReader with custom format (a pipe delimiter for instance) and name your fields in the Avro schema. Then in ConvertRecord you can set a JsonRecordSetWriter to convert to JSON. This same approach will work for any supported output format, or you can even write your own ScriptedRecordSetWriter if you need a custom format.

If you do need to extract to attributes, you can use ExtractText with a regular expression that matches each field, and you can add user-defined properties to extract the group(s) into their associated attributes (the property name is the field name such as "id" or "sequence", and the value is the grouping expression, perhaps $2, $3, etc.)

8 REPLIES 8

Re: [Nifi] Converting a delimited FlowFile's content to attributes

Hi @Cesar Rodrigues

The cleaniest way should be to use ConvertRecord processor with a CSVReader (using Delimiter as pipe) and JSonSetRecordWriter.

This directly convert your CSV into JSON without passing by attributes. Using Record processors also gives you better performance.

Thanks

Re: [Nifi] Converting a delimited FlowFile's content to attributes

New Contributor

@Abdelkrim Hadjidj

Thank you! Could you please provide more details on how to use the schema registry? I'm having some trouble with that.

Re: [Nifi] Converting a delimited FlowFile's content to attributes

You don't have to extract the fields to attributes if you are converting the contents to a different format, instead you can use ConvertRecord with a CSVReader with custom format (a pipe delimiter for instance) and name your fields in the Avro schema. Then in ConvertRecord you can set a JsonRecordSetWriter to convert to JSON. This same approach will work for any supported output format, or you can even write your own ScriptedRecordSetWriter if you need a custom format.

If you do need to extract to attributes, you can use ExtractText with a regular expression that matches each field, and you can add user-defined properties to extract the group(s) into their associated attributes (the property name is the field name such as "id" or "sequence", and the value is the grouping expression, perhaps $2, $3, etc.)

Re: [Nifi] Converting a delimited FlowFile's content to attributes

New Contributor
@Matt Burgess

I've never used the Avro Schema before. Could you please explain how to name the fields in it? I checked the documentation, but it's a little bit confusing.

Thanks in advance!

Re: [Nifi] Converting a delimited FlowFile's content to attributes

Avro Schemas can be confusing the first couple of times you create them :) In your case you could use the following:

{
 "namespace": "nifi",
 "name": "cesarPipeDelimitedRecord",
 "type": "record",
 "fields": [
  {"name": "id","type": "string"},
  {"name": "sequence","type": "int"},
  {"name": "category","type": "int"},
  {"name": "text","type": "string"}
]
}

If you can have missing values, then you can replace the type with a union, for example if "category" can be missing, then its field entry can be

{"name": "category","type": ["null","int"]},

Re: [Nifi] Converting a delimited FlowFile's content to attributes

New Contributor

Thanks, @Matt Burgess! This helped a lot ;)

Re: [Nifi] Converting a delimited FlowFile's content to attributes

New Contributor

Hi @Matt Burgess How about if just wanna keep the content as Attribute? Like my scenario is that I want a user to give parameters through a csv which I can parse and use them attribute , for example like User wants to import a tbale , he will write table in csv and that I'll use as an attribite in flowfile.

My present approach is :

ListFile->FetchFile->SplitFile->extractText->updateAttribute. But doesn't seems to be wroking out. Any suggestions?

Re: [Nifi] Converting a delimited FlowFile's content to attributes

What issues are you having? That flow description seems like it should work. Perhaps your regular expression or other config of ExtractText needs tweaking?

Don't have an account?
Coming from Hortonworks? Activate your account here