Created on 07-20-2018 08:10 AM - edited 09-16-2022 06:29 AM
Hi,
I have JSON schema which is very deeply nested, how can we automatically create hive DDL out of JSON schema.
I did googling and all I am seeing how to create hive table out of JSON data.
Thanks,
Jai
Created 07-24-2018 06:59 AM
You can use JSON Serde. You have to create the table with a structure that maps the structure of the json.
For example:
data.json
{"X": 134, "Y": 55, "labels": ["L1", "L2"]} {"X": 11, "Y": 166, "labels": ["L1", "L3", "L4"]}
create table
CREATE TABLE Point ( X INT, Y INT, labels ARRAY<STRING> ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS TEXTFILE LOCATION 'path/to/table';
Then you should upload your json file in the location path of the table, giving the right permissions and you are good to go.
Created 08-21-2018 12:00 PM
Hi Ludof,
First of all, thanks a lot for the response, second my apologies that I could not respond you timely.
Actually I have very complex XSDs with >2000 elements in nested xsd complex types. So, above solution would not work in my case. I can not create hive table manually with these number of elements and also Objects nested at 10th level
Sorry, I cannot share the code here but this is how I implemented the project.
Goal: Ingest XMLs data into HDFS and query using Hive/Impala
Solution: Convert XDS into Hive Avro table and keep pumping xml -> avro into hdfs.
Thanks,
Jai