Support Questions

Find answers, ask questions, and share your expertise

Convert Json flow file data to ORC

Explorer

My target table is an ORC with partitions. The source of data is Json files coming from IBM MQ.I am currently splitting the Json and Merging the Json files. Now , I want to ingest this into ORC tables. How can I accomplish it with NIFI?

I see a Json to Avro and AVRO to ORC converter but the challenge is my Json file is very nested,The Json to AVRO processor is asking for a AVRO schema. I am new to the AVRO format. Are there any tools around to create an AVRO schema.Also, Can this flow directly write into partition tables?

The second option, i figured out is writing into a staging table and selecting and writing into ORC table.I don't want to do a full load every time so for delta loading my script would be couple of SQL statements like

with max_row_id as( select max(row_id) as MAX from target table)

insert into dc_transdetail_orc select * from staging table where row_id > max_row_id

Can the above query be accomplished by PUTSQL,PUTHIVEQL orPUTDATABASERECORD? From the description of the processor, it looks like it accepts only query at a time.

Any advice on this is highly appreciated. Thanks

3 REPLIES 3

Super Guru

You can try InferAvroSchema before ConvertJsonToAvro, you can set the schema destination as "flowfile-attribute" and it will put the schema into an attribute called "inferred.avro.schema". You can then set the schema property in ConvertJsonToAvro to the following NiFi Expression Language expression:

${inferred.avro.schema}

And the processor will use the inferred schema to do the conversion. If you find that the inferred schema is not correct, then you could paste it into a text application and make the necessary changes to type, field names, etc. Then you could paste the changes directly into the schema property (rather than the EL expression above) and restart your flow to use the explicit schema.

Explorer

Thanks Matt for the above information.
I followed the instructions provided . Below is my flow.

SplitJson --> InforAvroSchema --> ConvertJsonTo Avro-->PutEmail. I don't see an output generated from InforAVroSchema. Nothing is sent to my email. I also tried to use SplitJson --> InforAvroSchema --> ConvertJsonTo Avro-->PutFTP -->PUTEMAIL. I don't see any files generated.In the InforAvro Schema, The input says 10 files but for some reason the output is not generated. My goal is to see the schema file. Please let me know if i am doing something wrong.

Super Guru

What are your settings for InferAvroSchema? What are you trying to send in PutEmail? If you want an Avro representation of the individual JSON objects, then as long as InferAvroSchema is outputting to flowfile-attribute (which puts it in the "inferred.avro.schema" attribute as described above), then the flow files should still contain the original JSON, and if ${inferred.avro.schema} is specified as the schema in ConvertJsonToAvro, then the flow files' contents should contain valid Avro versions of the incoming JSON files.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.