Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

InferAvroSchema + GetAvroMetadata

Highlighted

InferAvroSchema + GetAvroMetadata

Contributor

Hello,

I'm using the processor inferAvroSchema to infer the schema table from a csv. I want to use the GetAvroMetadata processor to extract the record name from the results of inferAvroSchema but I don't know how to configure the processor.

I know that I can use a groovy code to extract the record name but I think GetAvroMetadata is used for this purpose too.

capture.png

Can someone help me plz.

Thank you

1 REPLY 1

Re: InferAvroSchema + GetAvroMetadata

InferAvroSchema requires you to enter the record name yourself, if you need access to that record name later, you could set a variable on the process group and use that in Expression Language for the Record Name, then you'd have access to that same variable everywhere in the process group and wouldn't have to extract it.

If you don't have access to the value injected into the schema by InferAvroSchema, and the schema is in an attribute (let's say "inferred.avro.schema"), then you can use the jsonPath() function in NiFi Expression Language to extract the record name into a separate attribute. You'd need an UpdateAttribute to set "record.name" to the following:

${inferred.avro.schema:jsonPath("$.name")}

If your schema is in the content of the flow file, then since it is a JSON object you can use EvaluateJSONPath to get the record name into an attribute, using the following JSONPath expression:

$.name

ExtractAvroMetadata is for Avro files, so if your flow file contained Avro (with an embedded schema), you could use ExtractAvroMetadata (adding "avro.schema" to the list of metadata to extract) in order to get the schema. But this processor doesn't work for CSV files.