Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Nested JSON to columns using Impala SQL function

avatar
Expert Contributor

Hello,

 

I have a Impala table with a JSON column with values like below. I am trying to get the json values into columns using get_json_object(col1, etc.) . Can you help me with the syntax how to extract all the values as columns using SQL. Note the spaces, unicode and nesting in the json variables.

 

col1

--------------------

{u'CH4: NO2 (we-aux)': {u'unit': u'mV', u'value': 4.852294921875}, u'CH6: Ox concentration (we-aux)': {u'unit': u'ppb', u'value': -84.73094995471016}}

{u'CH4: NO2 (we-aux)': {u'unit': u'mV', u'value': 5.852294921875}, u'CH6: Ox concentration (we-aux)': {u'unit': u'ppb', u'value': -94.73094995471016}}

....

2 REPLIES 2

avatar
Contributor

https://impala.apache.org/docs/build/html/topics/impala_misc_functions.html#misc_functions__get_json... has some good documentation on how to use the get_json_object function.

 

What Impala version are you using? What is the type of the column that contains the JSON data?

 

I hit a couple issues when parsing the JSON you posted. I believe the JSON standard does not allow for single quotes. Standard online JSON parsers have trouble with the 'u' character as well.

 

I was able to get the following to work on Impala master:

 

[localhost:21000] default> select get_json_object("{\"CH4: NO2 (we-aux)\": {\"unit\": \"mV\", \"value\": 4.852294921875}, \"CH6: Ox concentration (we-aux)\": {\"unit\": \"ppb\", \"value\": -84.73094995471016}}", '$.*');
Query: select get_json_object("{\"CH4: NO2 (we-aux)\": {\"unit\": \"mV\", \"value\": 4.852294921875}, \"CH6: Ox concentration (we-aux)\": {\"unit\": \"ppb\", \"value\": -84.73094995471016}}", '$.*')
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| get_json_object('{"ch4: no2 (we-aux)": {"unit": "mv", "value": 4.852294921875}, "ch6: ox concentration (we-aux)": {"unit": "ppb", "value": -84.73094995471016}}', '$.*') |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [{"unit":"mV","value":4.852294921875},{"unit":"ppb","value":-84.73094995471016}] |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

 

avatar
Expert Contributor

You are absolutely right. After struggling with various syntax, realized CDH5.16 version

impalad version 2.12.0-cdh5.16.1 

doesnt support get_json_object(). So finally using json.dumps() removed all the unicode u' characters and also removed all strange characters in the json fields to normal characters like CH4_NO2_WE_AUX. After that ended up using Hive instead of Impala with a query like below to extract the values as columns. The json_column1 is a string datatype.

--------------------------------------------
select b.b1, c.c1,c.c2,d.d1,d.d2
from json_table1 a
lateral view json_tuple(a.json_column1, 'CH4_NO2_WE_AUX', 'CH7_CO_CONCENTRATION_WE') b as b1,b2
lateral view json_tuple(b.b1,'unit','value') c as c1,c2
lateral view json_tuple(b.b2,'unit','value') d as d1,d2
;