Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Apache NIFI - Generic framework to parse JSON and store in HIVE tables

Highlighted

Apache NIFI - Generic framework to parse JSON and store in HIVE tables

New Contributor

NIFI Version - 1.7.0.3.2.0.0-520, HiveServer2.

Hi. Very new to NIFI. I've got a requirement to develop a generic framework for parsing JSON files, process it, extract the data and store it in HIVE table in a structured format. Each JSON file will definitely differ with the schema. Also, it can be both SIMPLE and NESTED JSON's depending on the source.

The client is expecting us to implement this using Apache NIFI only. I already went through many HCC technotes, but neither it seemed to work nor it doesn't match my requirement. Almost all notes requires us to mention the schema names manually in EvaluateJSON / SplitJSON processors. But our expectation is that, NIFI should do all these things without much manual work. In a nutshell, it should be generic & able to handle all types of JSON files.

Again, I am a rookie to NIFI. So I'd need a step-by-step screenshot based explanation / guidance please ! It would be off a great help. If the exact requirement is much complex to be implemented, kindly suggest a method which is at least closer to the original requirement using NIFI. Many thanks in advance.

1 REPLY 1

Re: Apache NIFI - Generic framework to parse JSON and store in HIVE tables

Super Guru

@Adarsh R

You can try using InferAvroSchema processor with SchemaOutputDestination as flowfile-attribute.

Then processor adds "inferred.avro.schema" attribute to the flowfile use this attribute in your Record oriented processors.

Refer this and this links for more details regards to usage and challenges with the inferavroschema processor.

Don't have an account?
Coming from Hortonworks? Activate your account here