Member since
03-29-2020
1
Post
0
Kudos Received
0
Solutions
03-29-2020
05:58 AM
@saaga119 The solution here is to create the external table in the format you need to query the raw xml data. Seems like you have this done already. Next create a native hive table. It could still be external too. The new table should be the schema for the fields you want with required ORC lines. You can also choose some other text format; for example: csv. Last INSERT INTO new_table SELECT * FROM external_table. The main idea here is to use external and staging tables, then INSERT INTO SELECT to fill the final table you want. Final table could then be optimized for performance (ORC). This kind of idea also allows multiple external/raw/staging tables able to be combined (select w/ join) into a single final optimized table (orc, compression, partitions, buckets, etc).
... View more