- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
JSON parent class extraction from Hadoop table
- Labels:
-
Apache Hive
-
Apache Knox
-
Cloudera Hue
Created 04-04-2025 09:43 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I need to extract a parent JSON column present in Hadoop:
This is an example of my JSOn.Here I need to extract only values products class but not of offer. I cant use index of product as the the number of occurrences might change in future.
Have used the following regex
regexp_replace(regexp_replace(get_json_object(xml_data, '$.customer.products'),'\\}\\,\\{','\\}\\\n\\{'),'\\[|\\]','') AS prd_JSON
But is is only giving all values in a row , not in multiple rows.
Have also tried something like below, but it doesnt give correct result:
regexp_replace(regexp_replace(get_json_object(xml_data, '$.customer.products'),'\\}\\,\\{','\\}\\\n\\{'),'\\[|\\]','') AS prd_JSON
from prd_table
)
select
get_json_object(prd_item,'$.productCode') as productCode,
get_json_object(prd_item,'$.type') as Ttype
from prd
LATERAL VIEW OUTER EXPLODE(split(prd_JSON, ';')) p AS prd_item) a
{
"Customer": {
"CustomerId": "19900101",
"appl": [
{
***Some values
}
]
"products": [
{
"ProductCOde": ABC,
"Type": C1,
"Offer": 16575.0,
"Offer": [
{
"ProductCOde": A1,
"Offer": 2000.0
"Type": CBA,
}
{
"ProductCOde": A2,
"Offer": 2000.0
"Type": DBA,
}
{
"ProductCOde": A3,
"Offer": 3000.0
"Type": BBA,
}
]
}
{
"ProductCOde": XYZ,
"Type": C2,
"Offer": 2001,
"Offer": [
{
"ProductCOde": B1,
"Offer": 2000.0
"Type": BBA,
}
{
"ProductCOde": B2,
"Offer": 2000.0
"Type": BBA,
}
]
}
]
}
}
Can anyone point out what I am doing wrong here
