Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to validating a JSON file with JSON Schema.?

avatar
Super Collaborator

Hi,

i need to ingest only the JSON files following a valid schema.

i am trying to achieve this by validate-record processor.

i am supplying the same schema for both JSONTreeReader and JSONRecsetWriter.

I am not using AVRO because my input contains _ in the names.

(but i came up with this schema by modifying the input file without _ and using inferAvroSchema and then changed both to use _ to match the input file)

my schema and files are matching but its sending it to invalid relation. Anything wrong that i am doing..??

Schema :

{ "type": "record",

"name": "iHist",

"fields": [

{ "name": "file_name", "type": "string" },

{ "name": "plant", "type": "string" },

{ "name": "collector", "type": "string" },

{ "name": "name", "type": "string" },

{ "name": "unique_id", "type": "string" },

{ "name": "description", "type": "string" },

{ "name": "general_1", "type": "string" },

{ "name": "general_2", "type": "string" },

{ "name": "general_3", "type": "string" },

{ "name": "general_4", "type": "string" },

{ "name": "general_5", "type": "string" },

{ "name": "data_points", "type":

{ "type": "array",

"items":

{ "type": "record",

"name": "data_points",

"fields": [

{ "name": "timestamp", "type": "string" },

{ "name": "value", "type": "string" },

{ "name": "quality", "type": "string" }

]

} } } ] }

Data file..

{ "file-name": "tp-tcollec.tag.json",

"plant": "P11A3",

"collector": "test_Collector",

"name": "tag_SAFETY_MARGN.F_CV",

"unique-id": "1532358720761",

"description": "test",

"general-1": "",

"general-2": "",

"general-3": "",

"general-4": "",

"general-5": "",

"datapoints": [

{ "timestamp": "2016-07-19T10:25:43.000Z", "value": "177", "quality": "100" },

{ "timestamp": "2016-07-19T10:25:42.000Z", "value": "177", "quality": "100" },

{ "timestamp": "2016-07-19T10:25:41.000Z", "value": "177", "quality": "100" } ] }

I just need to validate if the input file is following the schema. any better ways to do this.??

82450-jrecsetwriter.jpg

82449-jtreereader.jpg

82448-validate.jpg

1 ACCEPTED SOLUTION

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
5 REPLIES 5

avatar
Super Collaborator

Hi @Matt Burgess ,

any idea what i am doing wrong in the above case.?

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Super Collaborator

@Matt Burgess ,

thank you. I didn't know about Validate Field Names.

avatar
Super Collaborator

Hi @Matt Burgess ,

is there anyway I can use the "validaterecord" to just validate if its following a schema and then route to valid.

don't know why we need to have the "Record Writer" for validation. its changing the file format a little bit. moving the tags order etc..

I just want the input file as output if its valid without changing the contents. or is there any other way that I can achieve this.?

Regards,

Sai

avatar
Master Guru

ValidateRecord is more about validating the individual records than it is about validating the entire flow file. If some records are valid and some are invalid, each type will be routed to the corresponding relationship. However, for invalid records, we can't use the same record writer as valid records, or else we know it will fail (because we know they're invalid), so there is a second RecordWriter for invalid records (you might use this to try to record the field names or something, but by the time that ValidateRecord knows the individual record is invalid, it doesn't know that it came in as Avro (for example), nor does it know that you might want it to go out as Avro. That's the flexibility and power of the Record Reader/Writer paradigm, but in this case the tradeoff is that you can't currently treat the entire flow file as valid or invalid.

It may make sense to have a "Invalid Record Strategy" property, to choose between "Individual Records" using the RecordWriters (the current behavior), or "Original FlowFile" which would ignore the RecordWriters and instead transfer the entire incoming flow file as-is to the 'invalid' relationship. Please feel free to file an improvement Jira for this capability.