- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Parsing nested JSOn in HIVE
- Labels:
-
Apache Hive
Created 02-19-2025 07:41 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a table with one column as JSON file. I am using lateral explode to parse values.The code is working fine for classes except for one where subclasses have similar variable name :
Json:
{
"Customer": {
"CustomerId": "19900101",
"appl": [
{
***Some values
}
]
"products": [
{
"ProductCOde": ABC,
"Type": C1,
"Offer": 16575.0,
"Offer": [
{
"ProductCOde": A1,
"Offer": 2000.0
"Type": CBA,
}
{
"ProductCOde": A2,
"Offer": 2000.0
"Type": DBA,
}
{
"ProductCOde": A3,
"Offer": 3000.0
"Type": BBA,
}
]
}
{
"ProductCOde": XYZ,
"Type": C2,
"Offer": 2001,
"Offer": [
{
"ProductCOde": B1,
"Offer": 2000.0
"Type": BBA,
}
{
"ProductCOde": B2,
"Offer": 2000.0
"Type": BBA,
}
]
}
]
}
}
Code:
with product as (
select
xml_data,
get_json_object(xml_data,'$.customer.customerid')as applicationid,
regexp_replace(regexp_replace(get_json_object(xml_data, '$.customer.products'), '\\[|\\]', ''), '\\}\\,\\{', '\\}\\;\\{') AS product_JSOn
from Custome_info
)
select
applicationid,
get_json_object(prod_item,'$.ProductCOde') as ProductCOde,
get_json_object(prod_item,'$.type') as ttype
from product
LATERAL VIEW OUTER EXPLODE(split(product_JSOn, ';')) p AS prod_item;
Output
ApplicationID ProductCode ttype
19900101 ABC NULL
19900101 A1 NULL
19900101 A2 NULL
19900101 B1 NULL
I am not able to figure why values of subclasses are populating here
Created 03-05-2025 10:30 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To resolve the issue where the ttype column returns NULL and subclasses' values aren't being parsed correctly, follow these steps:
1. Case Sensitivity in JSON Paths
The JSON keys Type and ProductCOde use uppercase letters. Correct the paths in get_json_object to match the exact case:
Use '$.Type' instead of '$.type'.
Ensure '$.ProductCOde' matches the case in the JSON (if it's a typo, adjust accordingly).
2. Properly Handle Nested Arrays
Avoid flattening both the outer products array and inner Offer array into the same lateral view. Use separate lateral views for each array level.
Corrected Code
WITH product AS (
SELECT
xml_data,
GET_JSON_OBJECT(xml_data, '$.Customer.CustomerId') AS applicationid,
-- Extract products array as a JSON string
GET_JSON_OBJECT(xml_data, '$.Customer.products') AS products_json
FROM Custome_info
),
-- Explode the products array
exploded_products AS (
SELECT
applicationid,
product_item
FROM product
LATERAL VIEW OUTER EXPLODE(
SPLIT(
REGEXP_REPLACE(
REGEXP_REPLACE(products_json, '^\\[|\\]$', ''), -- Remove outer brackets
'\\}\\,\\{', '\\}\\;\\{' -- Split products
),
';'
)
) p AS product_item
),
-- Explode the Offer array within each product
exploded_offers AS (
SELECT
applicationid,
GET_JSON_OBJECT(product_item, '$.ProductCOde') AS ProductCode,
GET_JSON_OBJECT(product_item, '$.Type') AS ttype,
offer_item
FROM exploded_products
LATERAL VIEW OUTER EXPLODE(
SPLIT(
REGEXP_REPLACE(
REGEXP_REPLACE(
GET_JSON_OBJECT(product_item, '$.Offer'),
'^\\[|\\]$', '' -- Remove inner Offer brackets
),
'\\}\\,\\{', '\\}\\;\\{' -- Split Offer items
),
';'
)
) o AS offer_item
)
-- Combine results (products and their offers)
SELECT
applicationid,
COALESCE(
GET_JSON_OBJECT(offer_item, '$.ProductCOde'), -- Offer-level ProductCode
ProductCode -- Product-level ProductCode
) AS ProductCode,
COALESCE(
GET_JSON_OBJECT(offer_item, '$.Type'), -- Offer-level Type
ttype -- Product-level Type
) AS ttype
FROM exploded_offers;
Explanation
Case Sensitivity Fix: Using '$.Type' ensures the correct extraction of the Type field from the JSON.
Nested Array Handling:
First, explode the products array to get individual product objects.
Then, explode the Offer array within each product to access sub-offer details.
Combined Results: Use COALESCE to merge product-level and offer-level values into a single output, ensuring both levels are represented.
Expected Output
SpoilerApplicationID | ProductCode | ttype
19900101 | ABC | C1
19900101 | A1 | CBA
19900101 | A2 | DBA
19900101 | A3 | BBA
19900101 | XYZ | C2
19900101 | B1 | BBA
19900101 | B2 | BBA
Happy hadooping
