Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Nifi convert text file consisting of key value pairs to avro

avatar
Contributor

I have a text file I'm reading into a Nifi flow, which consists of key value pairs that look like the following:

status:"400" body_bytes_sent:"174" referer:"google.com" user_agent:"safari" host:"8.8.4.4" query_string:"devices"

status:"400" body_bytes_sent:"172" referer:"yahoo.com" user_agent:"Chrome" host:"8.8.4.3" query_string:"books"

Currently the tailfile processor is successfully reading these files as they are created and append to. However, I want to output them as avro files to Kafka. Any idea what processor(s) I need to convert these text files into avro format in my flow? What would the configuration look like for these processors?

1 ACCEPTED SOLUTION

avatar
Master Guru

Hi @Ed Prout,

if you are having text file consists of key value pairs,
  1. You need to split the text first as line by line using SplitText Processor
  2. use regex to extract values by using ExtractText processor, it will results values as attributes for the each flow file.
  3. ReplaceText processor to replace the attributes as contents of the flowfile.
  4. InferAvroSchema processor to get schema of the flowfile content.
  5. ConvertCSVToAvro processor to convert the flowfile contents into Avro format
  6. Publish Kafka processor to publish Avro data to the topic
    Configurations for the Processors:-
    SplitText:-
    splittext.png
    ExtractText:-
    extracttext.png
    ReplaceText:-
    replcetext.png
    InferAvroSchema:-
    inferavroschema.png
    ConvertCSVToAvro:-
    convertcsvtoavro.png
    Note:-

    i have used pipe(|) as delimiter in ReplaceText processor for replacement value, you can use any delimiter as you like. But the delimiter needs to match with InferAvroSchema csv header definition

i have taken a few key value pairs as input.
Input:- 
status:"400" body_bytes_sent:"174" referer:"google.com" 
Output after convertCSVToAvro Processor:- 
Obj...avro.schema..{
	"type": "record",
	"name": "sample",
	"doc": "Schema generated by Kite",
	"fields": [{
		"name": "status",
		"type": "long",
		"doc": "Type inferred from '400'"
	},
	{
		"name": "body_bytes_sent",
		"type": "long",
		"doc": "Type inferred from '174'"
	},
	{
		"name": "reference",
		"type": "string",
		"doc": "Type inferred from 'google.com'"
	}]
}.avro.codec.snappy...y...Zf....N*...*.8.....google.com*.e...y...Zf....N*

View solution in original post

2 REPLIES 2

avatar
Master Guru

Hi @Ed Prout,

if you are having text file consists of key value pairs,
  1. You need to split the text first as line by line using SplitText Processor
  2. use regex to extract values by using ExtractText processor, it will results values as attributes for the each flow file.
  3. ReplaceText processor to replace the attributes as contents of the flowfile.
  4. InferAvroSchema processor to get schema of the flowfile content.
  5. ConvertCSVToAvro processor to convert the flowfile contents into Avro format
  6. Publish Kafka processor to publish Avro data to the topic
    Configurations for the Processors:-
    SplitText:-
    splittext.png
    ExtractText:-
    extracttext.png
    ReplaceText:-
    replcetext.png
    InferAvroSchema:-
    inferavroschema.png
    ConvertCSVToAvro:-
    convertcsvtoavro.png
    Note:-

    i have used pipe(|) as delimiter in ReplaceText processor for replacement value, you can use any delimiter as you like. But the delimiter needs to match with InferAvroSchema csv header definition

i have taken a few key value pairs as input.
Input:- 
status:"400" body_bytes_sent:"174" referer:"google.com" 
Output after convertCSVToAvro Processor:- 
Obj...avro.schema..{
	"type": "record",
	"name": "sample",
	"doc": "Schema generated by Kite",
	"fields": [{
		"name": "status",
		"type": "long",
		"doc": "Type inferred from '400'"
	},
	{
		"name": "body_bytes_sent",
		"type": "long",
		"doc": "Type inferred from '174'"
	},
	{
		"name": "reference",
		"type": "string",
		"doc": "Type inferred from 'google.com'"
	}]
}.avro.codec.snappy...y...Zf....N*...*.8.....google.com*.e...y...Zf....N*

avatar
Contributor

This worked nicely. Thanks Yash!