Created 01-23-2020 07:47 AM
Hello,
I have a Impala table with a JSON column with values like below. I am trying to get the json values into columns using get_json_object(col1, etc.) . Can you help me with the syntax how to extract all the values as columns using SQL. Note the spaces, unicode and nesting in the json variables.
col1
--------------------
{u'CH4: NO2 (we-aux)': {u'unit': u'mV', u'value': 4.852294921875}, u'CH6: Ox concentration (we-aux)': {u'unit': u'ppb', u'value': -84.73094995471016}}
{u'CH4: NO2 (we-aux)': {u'unit': u'mV', u'value': 5.852294921875}, u'CH6: Ox concentration (we-aux)': {u'unit': u'ppb', u'value': -94.73094995471016}}
....
Created 01-28-2020 08:15 AM
https://impala.apache.org/docs/build/html/topics/impala_misc_functions.html#misc_functions__get_json... has some good documentation on how to use the get_json_object function.
What Impala version are you using? What is the type of the column that contains the JSON data?
I hit a couple issues when parsing the JSON you posted. I believe the JSON standard does not allow for single quotes. Standard online JSON parsers have trouble with the 'u' character as well.
I was able to get the following to work on Impala master:
[localhost:21000] default> select get_json_object("{\"CH4: NO2 (we-aux)\": {\"unit\": \"mV\", \"value\": 4.852294921875}, \"CH6: Ox concentration (we-aux)\": {\"unit\": \"ppb\", \"value\": -84.73094995471016}}", '$.*');
Query: select get_json_object("{\"CH4: NO2 (we-aux)\": {\"unit\": \"mV\", \"value\": 4.852294921875}, \"CH6: Ox concentration (we-aux)\": {\"unit\": \"ppb\", \"value\": -84.73094995471016}}", '$.*')
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| get_json_object('{"ch4: no2 (we-aux)": {"unit": "mv", "value": 4.852294921875}, "ch6: ox concentration (we-aux)": {"unit": "ppb", "value": -84.73094995471016}}', '$.*') |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [{"unit":"mV","value":4.852294921875},{"unit":"ppb","value":-84.73094995471016}] |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Created on 01-28-2020 08:41 AM - edited 01-28-2020 08:49 AM
You are absolutely right. After struggling with various syntax, realized CDH5.16 version
impalad version 2.12.0-cdh5.16.1
doesnt support get_json_object(). So finally using json.dumps() removed all the unicode u' characters and also removed all strange characters in the json fields to normal characters like CH4_NO2_WE_AUX. After that ended up using Hive instead of Impala with a query like below to extract the values as columns. The json_column1 is a string datatype.
--------------------------------------------
select b.b1, c.c1,c.c2,d.d1,d.d2
from json_table1 a
lateral view json_tuple(a.json_column1, 'CH4_NO2_WE_AUX', 'CH7_CO_CONCENTRATION_WE') b as b1,b2
lateral view json_tuple(b.b1,'unit','value') c as c1,c2
lateral view json_tuple(b.b2,'unit','value') d as d1,d2
;