Created 07-26-2016 07:09 PM
I have a table name EmpData in mySql server with two field as EmpID (INT) and Details (nvarchar). Deatils field contain JSON string .
EmpID | Details |
567 | { "name": "Michel","address":{"house no":"12","street":"Johnson road","city":"London","country":"UK"}} |
927 | { "name": "John","address":{"house no":"99","street":"Johnson road","city":"London","country":"UK"}} |
I want to import "Details" field JSON data into HDP hive table EmpHiveStore and query on hive table like
SELECT name,address.street from EmpHiveStore;
Is there any way to import JSON field data into Hive table using sqoop ?
Thank You.
Created 07-27-2016 10:09 AM
You JSON there is just going to be stored as a string by the looks of it, unless your RDBMS is using a JSON type.
To import this, you can just have that put into a string field and then use Hive to convert that to a map type for instance, or just use hive udfs to query it as json from the string type field (try using get_json_object in Hive for instance).
Once you've got the bits you want there you can transform them to a complex type table with proper maps for instance.
Another option worth considering would be to have sqoop just dump out just the json field, and have that stored as a text file rather than going to a hive table. You can then use the spark dataframe json reader to read in and infer the complex json schema for you, and then use that to build a Hive table with the spark saveAsTable function.
Created 07-27-2016 10:09 AM
You JSON there is just going to be stored as a string by the looks of it, unless your RDBMS is using a JSON type.
To import this, you can just have that put into a string field and then use Hive to convert that to a map type for instance, or just use hive udfs to query it as json from the string type field (try using get_json_object in Hive for instance).
Once you've got the bits you want there you can transform them to a complex type table with proper maps for instance.
Another option worth considering would be to have sqoop just dump out just the json field, and have that stored as a text file rather than going to a hive table. You can then use the spark dataframe json reader to read in and infer the complex json schema for you, and then use that to build a Hive table with the spark saveAsTable function.
Created 07-30-2016 02:23 PM
Thanks @Simon,
I put into a string field in hive table and query it using get_json_object. That works for me.
But i have another set of data in HDFS like.
1023,UK,{"cities":{"city1":"London","city2":"Birmingham","city3":"Liverpool"},"universities":{"universities1":"Cambridge","universities2":"Oxford"}},07-30-2016
So i want to store it in a hive table with schema like:
create table data (SerNo int, country string , detail string,date string )
Then what should be the table definition so that {"cities: ..... } will come as one column and rest with other ? what should be the column separator ?
If i put everything as one string field in hive table , then how i query SerNo ,country and date column. Is it possible by get_json_object ?