Member since
02-21-2018
2
Posts
0
Kudos Received
0
Solutions
08-21-2018
12:00 PM
Hi Ludof, First of all, thanks a lot for the response, second my apologies that I could not respond you timely. Actually I have very complex XSDs with >2000 elements in nested xsd complex types. So, above solution would not work in my case. I can not create hive table manually with these number of elements and also Objects nested at 10th level Sorry, I cannot share the code here but this is how I implemented the project. Goal: Ingest XMLs data into HDFS and query using Hive/Impala Solution: Convert XDS into Hive Avro table and keep pumping xml -> avro into hdfs. I took all XSDs into XML Spy tool and generated sample xml I still had to fix some elements with default values in it because Spark was able to infer more correctly and intelligently. For example “0000” was being inferred to long which is correct as per the values but sine it is in double quotes I would expect it as String and this is how XML Spy generated the default values for alpha numeric fields. Now I have fully curated XML sample file Wrote a Spark-xml code Gave the sample xml as input and converted into Avro file. We know Avro file has schema in it. Took the Avro schema and created Hive table on top of it Finally wrote the Spark job It reads xml files from HDFS At time of reading I am asking Spark to infer xml schema as per my custom schema which I have gotten from sample xml Convert xml into Avro file Write Avro file to HDFS location. Query using Hive/Impala Thanks, Jai
... View more
07-20-2018
08:10 AM
Hi, I have JSON schema which is very deeply nested, how can we automatically create hive DDL out of JSON schema. I did googling and all I am seeing how to create hive table out of JSON data. Thanks, Jai
... View more
Labels:
- Labels:
-
Apache Hive