Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How can i import JSON field data from mySql server into Hive table using sqoop.

avatar
Frequent Visitor

I have a table name EmpData in mySql server with two field as EmpID (INT) and Details (nvarchar). Deatils field contain JSON string .

EmpIDDetails
567

{ "name": "Michel","address":{"house no":"12","street":"Johnson road","city":"London","country":"UK"}}

927

{ "name": "John","address":{"house no":"99","street":"Johnson road","city":"London","country":"UK"}}

I want to import "Details" field JSON data into HDP hive table EmpHiveStore and query on hive table like

SELECT name,address.street from EmpHiveStore;

Is there any way to import JSON field data into Hive table using sqoop ?

Thank You.

1 ACCEPTED SOLUTION

avatar
Guru

You JSON there is just going to be stored as a string by the looks of it, unless your RDBMS is using a JSON type.

To import this, you can just have that put into a string field and then use Hive to convert that to a map type for instance, or just use hive udfs to query it as json from the string type field (try using get_json_object in Hive for instance).

Once you've got the bits you want there you can transform them to a complex type table with proper maps for instance.

Another option worth considering would be to have sqoop just dump out just the json field, and have that stored as a text file rather than going to a hive table. You can then use the spark dataframe json reader to read in and infer the complex json schema for you, and then use that to build a Hive table with the spark saveAsTable function.

View solution in original post

2 REPLIES 2

avatar
Guru

You JSON there is just going to be stored as a string by the looks of it, unless your RDBMS is using a JSON type.

To import this, you can just have that put into a string field and then use Hive to convert that to a map type for instance, or just use hive udfs to query it as json from the string type field (try using get_json_object in Hive for instance).

Once you've got the bits you want there you can transform them to a complex type table with proper maps for instance.

Another option worth considering would be to have sqoop just dump out just the json field, and have that stored as a text file rather than going to a hive table. You can then use the spark dataframe json reader to read in and infer the complex json schema for you, and then use that to build a Hive table with the spark saveAsTable function.

avatar
Frequent Visitor

Thanks @Simon,

I put into a string field in hive table and query it using get_json_object. That works for me.

But i have another set of data in HDFS like.

1023,UK,{"cities":{"city1":"London","city2":"Birmingham","city3":"Liverpool"},"universities":{"universities1":"Cambridge","universities2":"Oxford"}},07-30-2016

So i want to store it in a hive table with schema like:

create table data (SerNo int, country string , detail string,date string )

Then what should be the table definition so that {"cities: ..... } will come as one column and rest with other ? what should be the column separator ?

If i put everything as one string field in hive table , then how i query SerNo ,country and date column. Is it possible by get_json_object ?