Support Questions

Find answers, ask questions, and share your expertise

Nested JSON to columns using Impala SQL function

avatar
Expert Contributor

Hello,

 

I have a Impala table with a JSON column with values like below. I am trying to get the json values into columns using get_json_object(col1, etc.) . Can you help me with the syntax how to extract all the values as columns using SQL. Note the spaces, unicode and nesting in the json variables.

 

col1

--------------------

{u'CH4: NO2 (we-aux)': {u'unit': u'mV', u'value': 4.852294921875}, u'CH6: Ox concentration (we-aux)': {u'unit': u'ppb', u'value': -84.73094995471016}}

{u'CH4: NO2 (we-aux)': {u'unit': u'mV', u'value': 5.852294921875}, u'CH6: Ox concentration (we-aux)': {u'unit': u'ppb', u'value': -94.73094995471016}}

....

2 REPLIES 2

avatar
Contributor

https://impala.apache.org/docs/build/html/topics/impala_misc_functions.html#misc_functions__get_json... has some good documentation on how to use the get_json_object function.

 

What Impala version are you using? What is the type of the column that contains the JSON data?

 

I hit a couple issues when parsing the JSON you posted. I believe the JSON standard does not allow for single quotes. Standard online JSON parsers have trouble with the 'u' character as well.

 

I was able to get the following to work on Impala master:

 

[localhost:21000] default> select get_json_object("{\"CH4: NO2 (we-aux)\": {\"unit\": \"mV\", \"value\": 4.852294921875}, \"CH6: Ox concentration (we-aux)\": {\"unit\": \"ppb\", \"value\": -84.73094995471016}}", '$.*');
Query: select get_json_object("{\"CH4: NO2 (we-aux)\": {\"unit\": \"mV\", \"value\": 4.852294921875}, \"CH6: Ox concentration (we-aux)\": {\"unit\": \"ppb\", \"value\": -84.73094995471016}}", '$.*')
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| get_json_object('{"ch4: no2 (we-aux)": {"unit": "mv", "value": 4.852294921875}, "ch6: ox concentration (we-aux)": {"unit": "ppb", "value": -84.73094995471016}}', '$.*') |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [{"unit":"mV","value":4.852294921875},{"unit":"ppb","value":-84.73094995471016}] |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

 

avatar
Expert Contributor

You are absolutely right. After struggling with various syntax, realized CDH5.16 version

impalad version 2.12.0-cdh5.16.1 

doesnt support get_json_object(). So finally using json.dumps() removed all the unicode u' characters and also removed all strange characters in the json fields to normal characters like CH4_NO2_WE_AUX. After that ended up using Hive instead of Impala with a query like below to extract the values as columns. The json_column1 is a string datatype.

--------------------------------------------
select b.b1, c.c1,c.c2,d.d1,d.d2
from json_table1 a
lateral view json_tuple(a.json_column1, 'CH4_NO2_WE_AUX', 'CH7_CO_CONCENTRATION_WE') b as b1,b2
lateral view json_tuple(b.b1,'unit','value') c as c1,c2
lateral view json_tuple(b.b2,'unit','value') d as d1,d2
;