Support Questions

Find answers, ask questions, and share your expertise

How to create hive table out of JSON Schema with s dynamic nested array

avatar
New Contributor

Hi all - I am trying to create a Hive table from nested JSON parquet data.  The problem is one object is dynamic and I want to store it as a string since it changes. 

Example JSON: 

{
"level1": {
"level2": {
"key1": "someString",
"level3": {
"level4": {
"key2": 1234,
"level5": [
{
"this": 1,
"changes": false,
"each": 12345,
"item": 0,
},
{
"something": 12345,
"new": true,
"here": [
123
]
}
]
}
}
}
}
}
 
Here is what I have tried:
CREATE EXTERNAL TABLE IF NOT EXISTS my_table (
level1 STRUCT<,
level2 STRUCT<,
key1 STRING,
level3 STRUCT<,
level4 STRUCT<,
key2 BIGINT,
level5 ARRAY<STRING>>>>>
)
STORED AS PARQUET
LOCATION '/my/json/parquet/'
TBLPROPERTIES ("parquet.compression"="SNAPPY");

This will successfully create the table.  I can query down all levels except "level5" falls apart.  Is there a way I can cast the array in level5 into a string since it always changes??   

FYI I have tried 
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe', but this does not mix with parquet format.  Please help! 
2 REPLIES 2

avatar
Community Manager

@Maicat Welcome to the Cloudera Community!

To help you get the best possible solution, I have tagged our Hive experts @caio_contente @cravani  who may be able to assist you further.

Please keep us updated on your post, and we hope you find a satisfactory solution to your query.


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Guru

@Maicat You can not typecast array to the string. There are 2 ways you can use

 1. Select the nth object of the array.

    SELECT level5[0] AS first_genre FROM my_table

   WHere 0 is the first object

2. You can flatten it

SELECT column1 FROM my_table LATERAL VIEW explode(level5) genre_table AS level5;