Support Questions

Find answers, ask questions, and share your expertise

Parsing nested JSOn in HIVE

avatar

I have a table with one column as JSON file. I am using lateral explode to parse values.The code is working fine for classes except for one where subclasses have similar variable name :

Json:

{
"Customer": {
"CustomerId": "19900101",
"appl": [
{
***Some values
}
]

"products": [
{
"ProductCOde": ABC,
"Type": C1,
"Offer": 16575.0,

"Offer": [
{
"ProductCOde": A1,
"Offer": 2000.0
"Type": CBA,
}

{
"ProductCOde": A2,
"Offer": 2000.0
"Type": DBA,
}
{
"ProductCOde": A3,
"Offer": 3000.0
"Type": BBA,
}
]
}
{
"ProductCOde": XYZ,
"Type": C2,
"Offer": 2001,

"Offer": [
{
"ProductCOde": B1,
"Offer": 2000.0
"Type": BBA,
}

{
"ProductCOde": B2,
"Offer": 2000.0
"Type": BBA,
}
]
}
]
}
}

 

Code:

with product as (
select
xml_data,
get_json_object(xml_data,'$.customer.customerid')as applicationid,

regexp_replace(regexp_replace(get_json_object(xml_data, '$.customer.products'), '\\[|\\]', ''), '\\}\\,\\{', '\\}\\;\\{') AS product_JSOn
from Custome_info

)
select

applicationid,

get_json_object(prod_item,'$.ProductCOde') as ProductCOde,

get_json_object(prod_item,'$.type') as ttype

from product
LATERAL VIEW OUTER EXPLODE(split(product_JSOn, ';')) p AS prod_item;



Output
ApplicationID ProductCode ttype
19900101 ABC NULL
19900101 A1 NULL
19900101 A2 NULL
19900101 B1 NULL

 

I am not able to figure why values of subclasses are populating here

1 REPLY 1

avatar
Master Mentor

@Rich_Learner 

To resolve the issue where the ttype column returns NULL and subclasses'  values aren't being parsed correctly, follow these steps:

1. Case Sensitivity in JSON Paths

The JSON keys Type and ProductCOde use uppercase letters. Correct the paths in get_json_object to match the exact case:

  • Use '$.Type' instead of '$.type'.

  • Ensure '$.ProductCOde'  matches the case in the JSON (if it's a typo, adjust accordingly).

2. Properly Handle Nested Arrays

Avoid flattening both the outer products array and inner Offer array into the same lateral view. Use separate lateral views for each array level.

Corrected Code

 

Spoiler

WITH product AS (
SELECT
xml_data,
GET_JSON_OBJECT(xml_data, '$.Customer.CustomerId') AS applicationid,
-- Extract products array as a JSON string
GET_JSON_OBJECT(xml_data, '$.Customer.products') AS products_json
FROM Custome_info
),

-- Explode the products array
exploded_products AS (
SELECT
applicationid,
product_item
FROM product
LATERAL VIEW OUTER EXPLODE(
SPLIT(
REGEXP_REPLACE(
REGEXP_REPLACE(products_json, '^\\[|\\]$', ''), -- Remove outer brackets
'\\}\\,\\{', '\\}\\;\\{' -- Split products
),
';'
)
) p AS product_item
),

-- Explode the Offer array within each product
exploded_offers AS (
SELECT
applicationid,
GET_JSON_OBJECT(product_item, '$.ProductCOde') AS ProductCode,
GET_JSON_OBJECT(product_item, '$.Type') AS ttype,
offer_item
FROM exploded_products
LATERAL VIEW OUTER EXPLODE(
SPLIT(
REGEXP_REPLACE(
REGEXP_REPLACE(
GET_JSON_OBJECT(product_item, '$.Offer'),
'^\\[|\\]$', '' -- Remove inner Offer brackets
),
'\\}\\,\\{', '\\}\\;\\{' -- Split Offer items
),
';'
)
) o AS offer_item
)

-- Combine results (products and their offers)
SELECT
applicationid,
COALESCE(
GET_JSON_OBJECT(offer_item, '$.ProductCOde'), -- Offer-level ProductCode
ProductCode -- Product-level ProductCode
) AS ProductCode,
COALESCE(
GET_JSON_OBJECT(offer_item, '$.Type'), -- Offer-level Type
ttype -- Product-level Type
) AS ttype
FROM exploded_offers;

 

Explanation

  1. Case Sensitivity Fix: Using '$.Type' ensures the correct extraction of the Type field from the JSON.

  2. Nested Array Handling:

    • First, explode the products array to get individual product objects.

    • Then, explode the Offer array within each product to access sub-offer details.

  3. Combined Results: Use COALESCE to merge product-level and offer-level values into a single output, ensuring both levels are represented.

    Expected Output

    Spoiler
    ApplicationID | ProductCode | ttype
    19900101      | ABC  | C1
    19900101      | A1       | CBA
    19900101      | A2      | DBA
    19900101      | A3      | BBA
    19900101      | XYZ    | C2
    19900101      | B1       | BBA
    19900101      | B2      | BBA

Happy hadooping