Support Questions

Find answers, ask questions, and share your expertise

JSON parent class extraction from Hadoop table

avatar

I need to extract a  parent JSON column present in Hadoop:

This is an example of my JSOn.Here I need to extract only values products class but not of offer. I cant use index of product as the the number of occurrences might change in future.

Have used the following regex 

regexp_replace(regexp_replace(get_json_object(xml_data, '$.customer.products'),'\\}\\,\\{','\\}\\\n\\{'),'\\[|\\]','') AS prd_JSON

But is is only giving all values in a row , not in multiple rows.

Have also tried something like below, but it doesnt give correct result:

regexp_replace(regexp_replace(get_json_object(xml_data, '$.customer.products'),'\\}\\,\\{','\\}\\\n\\{'),'\\[|\\]','') AS prd_JSON
from prd_table
)
select

get_json_object(prd_item,'$.productCode') as productCode,
get_json_object(prd_item,'$.type') as Ttype
from prd

LATERAL VIEW OUTER EXPLODE(split(prd_JSON, ';')) p AS prd_item) a

 

{
"Customer": {
"CustomerId": "19900101",
"appl": [
{
***Some values
}
]

"products": [
{
"ProductCOde": ABC,
"Type": C1,
"Offer": 16575.0,

"Offer": [
{
"ProductCOde": A1,
"Offer": 2000.0
"Type": CBA,
}

{
"ProductCOde": A2,
"Offer": 2000.0
"Type": DBA,
}
{
"ProductCOde": A3,
"Offer": 3000.0
"Type": BBA,
}
]
}
{
"ProductCOde": XYZ,
"Type": C2,
"Offer": 2001,

"Offer": [
{
"ProductCOde": B1,
"Offer": 2000.0
"Type": BBA,
}

{
"ProductCOde": B2,
"Offer": 2000.0
"Type": BBA,
}
]
}
]
}
}

 

Can anyone point out what I am doing wrong here

 

1 REPLY 1

avatar
Master Collaborator

Hi @Rich_Learner 

can you try this:

"SELECT
get_json_object(product_json, '$.ProductCOde') AS product_code,
get_json_object(product_json, '$.Type') AS product_type
FROM customer_table
LATERAL VIEW
json_tuple(json_column, 'Customer') c AS customer_json
LATERAL VIEW
json_tuple(customer_json, 'products') p AS products_json
LATERAL VIEW
explode(from_json(products_json, 'array<map<string,string>>')) product_table AS product_json
;""

 

OR

 

WITH cleaned_json AS (
SELECT
regexp_replace(
regexp_replace(
get_json_object(json_column, '$.Customer.products'),
'\\}\\s*,\\s*\\{',
'}~{'
),
'\\[|\\]',
''
) AS flat_products
FROM customer_table
),
split_json AS (
SELECT split(flat_products, '~') AS product_array
FROM cleaned_json
)
SELECT
get_json_object(item, '$.ProductCOde') AS product_code,
get_json_object(item, '$.Type') AS product_type
FROM split_json
LATERAL VIEW explode(product_array) exploded_table AS item;

 

Ensure your JSON keys match case-sensitively and Use consistent JSON structure. If offer is both a number and an array in different objects, consider preprocessing or cleaning up such inconsistencies.

 

Regards,

Chethan YM