Support Questions

edmund_prout · ‎08-29-2017

I have a text file I'm reading into a Nifi flow, which consists of key value pairs that look like the following:

status:"400" body_bytes_sent:"174" referer:"google.com" user_agent:"safari" host:"8.8.4.4" query_string:"devices"

status:"400" body_bytes_sent:"172" referer:"yahoo.com" user_agent:"Chrome" host:"8.8.4.3" query_string:"books"

Currently the tailfile processor is successfully reading these files as they are created and append to. However, I want to output them as avro files to Kafka. Any idea what processor(s) I need to convert these text files into avro format in my flow? What would the configuration look like for these processors?

Shu_ashu · ‎08-29-2017

Hi @Ed Prout,

if you are having text file consists of key value pairs,

You need to split the text first as line by line using SplitText Processor
use regex to extract values by using ExtractText processor, it will results values as attributes for the each flow file.
ReplaceText processor to replace the attributes as contents of the flowfile.
InferAvroSchema processor to get schema of the flowfile content.
ConvertCSVToAvro processor to convert the flowfile contents into Avro format
Publish Kafka processor to publish Avro data to the topic
Configurations for the Processors:-
SplitText:-
splittext.png
ExtractText:-
extracttext.png
ReplaceText:-
replcetext.png
InferAvroSchema:-
inferavroschema.png
ConvertCSVToAvro:-
convertcsvtoavro.png
Note:-
i have used pipe(|) as delimiter in ReplaceText processor for replacement value, you can use any delimiter as you like. But the delimiter needs to match with InferAvroSchema csv header definition

i have taken a few key value pairs as input.
Input:- 
status:"400" body_bytes_sent:"174" referer:"google.com" 
Output after convertCSVToAvro Processor:- 
Obj...avro.schema..{
	"type": "record",
	"name": "sample",
	"doc": "Schema generated by Kite",
	"fields": [{
		"name": "status",
		"type": "long",
		"doc": "Type inferred from '400'"
	},
	{
		"name": "body_bytes_sent",
		"type": "long",
		"doc": "Type inferred from '174'"
	},
	{
		"name": "reference",
		"type": "string",
		"doc": "Type inferred from 'google.com'"
	}]
}.avro.codec.snappy...y...Zf....N*...*.8.....google.com*.e...y...Zf....N*

View solution in original post

Shu_ashu · ‎08-29-2017

Hi @Ed Prout,

if you are having text file consists of key value pairs,

You need to split the text first as line by line using SplitText Processor
use regex to extract values by using ExtractText processor, it will results values as attributes for the each flow file.
ReplaceText processor to replace the attributes as contents of the flowfile.
InferAvroSchema processor to get schema of the flowfile content.
ConvertCSVToAvro processor to convert the flowfile contents into Avro format
Publish Kafka processor to publish Avro data to the topic
Configurations for the Processors:-
SplitText:-
splittext.png
ExtractText:-
extracttext.png
ReplaceText:-
replcetext.png
InferAvroSchema:-
inferavroschema.png
ConvertCSVToAvro:-
convertcsvtoavro.png
Note:-
i have used pipe(|) as delimiter in ReplaceText processor for replacement value, you can use any delimiter as you like. But the delimiter needs to match with InferAvroSchema csv header definition

i have taken a few key value pairs as input.
Input:- 
status:"400" body_bytes_sent:"174" referer:"google.com" 
Output after convertCSVToAvro Processor:- 
Obj...avro.schema..{
	"type": "record",
	"name": "sample",
	"doc": "Schema generated by Kite",
	"fields": [{
		"name": "status",
		"type": "long",
		"doc": "Type inferred from '400'"
	},
	{
		"name": "body_bytes_sent",
		"type": "long",
		"doc": "Type inferred from '174'"
	},
	{
		"name": "reference",
		"type": "string",
		"doc": "Type inferred from 'google.com'"
	}]
}.avro.codec.snappy...y...Zf....N*...*.8.....google.com*.e...y...Zf....N*

edmund_prout · ‎09-06-2017

This worked nicely. Thanks Yash!

Cloudera Community

Support Questions

Nifi convert text file consisting of key value pairs to avro