Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How can i import JSON field data from mySql server into Hive table using sqoop.

avatar
Explorer

I have a table name EmpData in mySql server with two field as EmpID (INT) and Details (nvarchar). Deatils field contain JSON string .

EmpIDDetails
567

{ "name": "Michel","address":{"house no":"12","street":"Johnson road","city":"London","country":"UK"}}

927

{ "name": "John","address":{"house no":"99","street":"Johnson road","city":"London","country":"UK"}}

I want to import "Details" field JSON data into HDP hive table EmpHiveStore and query on hive table like

SELECT name,address.street from EmpHiveStore;

Is there any way to import JSON field data into Hive table using sqoop ?

Thank You.

1 ACCEPTED SOLUTION

avatar
Guru

You JSON there is just going to be stored as a string by the looks of it, unless your RDBMS is using a JSON type.

To import this, you can just have that put into a string field and then use Hive to convert that to a map type for instance, or just use hive udfs to query it as json from the string type field (try using get_json_object in Hive for instance).

Once you've got the bits you want there you can transform them to a complex type table with proper maps for instance.

Another option worth considering would be to have sqoop just dump out just the json field, and have that stored as a text file rather than going to a hive table. You can then use the spark dataframe json reader to read in and infer the complex json schema for you, and then use that to build a Hive table with the spark saveAsTable function.

View solution in original post

2 REPLIES 2

avatar
Guru

You JSON there is just going to be stored as a string by the looks of it, unless your RDBMS is using a JSON type.

To import this, you can just have that put into a string field and then use Hive to convert that to a map type for instance, or just use hive udfs to query it as json from the string type field (try using get_json_object in Hive for instance).

Once you've got the bits you want there you can transform them to a complex type table with proper maps for instance.

Another option worth considering would be to have sqoop just dump out just the json field, and have that stored as a text file rather than going to a hive table. You can then use the spark dataframe json reader to read in and infer the complex json schema for you, and then use that to build a Hive table with the spark saveAsTable function.

avatar
Explorer

Thanks @Simon,

I put into a string field in hive table and query it using get_json_object. That works for me.

But i have another set of data in HDFS like.

1023,UK,{"cities":{"city1":"London","city2":"Birmingham","city3":"Liverpool"},"universities":{"universities1":"Cambridge","universities2":"Oxford"}},07-30-2016

So i want to store it in a hive table with schema like:

create table data (SerNo int, country string , detail string,date string )

Then what should be the table definition so that {"cities: ..... } will come as one column and rest with other ? what should be the column separator ?

If i put everything as one string field in hive table , then how i query SerNo ,country and date column. Is it possible by get_json_object ?