I have a nested json and would like to load into hive avro table as schema evaluation is a requirement. How do I extract schema from json for hive?
Created 06-23-2020 04:18 PM
Hi,
Nifi is one of the options.
Created 06-23-2020 07:25 PM
@rpathak thank you for reply. I haven't worked on niti. I was thinking if there is any other workaround to pull the schema from json and convert it to avro. I can use the same scheme in my hive table to load data.
Created 06-24-2020 05:02 AM
@Zerath I agree NiFi is a great tool to do this, but you can also do it right in hive. One solution you could try would be to create the hive table of the original data format and schema (source_table). Make sure you can select * from this source_table and see the desired results. Next create a table of the avro data format and schema (final_table). With source_table and final_table created you simply execute:
insert into final_table select * from source_table;
The results in final_table will be stored in Avro format.
If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.
Thanks,
Steven @ DFHZ
Created 06-24-2020 09:25 AM
Hi @stevenmatison , my problem is json file that has 300+ columns, it would be very tricky to build table on json schema of 300 columns manually and again manually build the avro schema for same number of columns. I was thinking in the line where I can infer the json schema and build avro schema file. And then supply the avro schema file in table properties of hive table.please let me know if this is feasible.Thank you.
Created 06-24-2020 10:14 AM