Support Questions

Find answers, ask questions, and share your expertise

How to import data from MongoDB to Hive or Hbase ?

avatar

Hi All,

I would like to know how I can import data from MongoDB (documents) to Hive or Hbase ?

Best Regards

1 ACCEPTED SOLUTION

avatar
Super Guru
@Hamza FRIOUA

Best option would be using Mongo hadoop connector with hive external tables but you need to built that jar manually or use prebuilt.

https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage

CREATE TABLE individuals
( 
  id INT,
  name STRING,
  age INT,
  work STRUCT<title:STRING, hours:INT>
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id","work.title":"job.position"}')
TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/test.persons');

View solution in original post

7 REPLIES 7

avatar
Super Guru
@Hamza FRIOUA

Best option would be using Mongo hadoop connector with hive external tables but you need to built that jar manually or use prebuilt.

https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage

CREATE TABLE individuals
( 
  id INT,
  name STRING,
  age INT,
  work STRUCT<title:STRING, hours:INT>
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id","work.title":"job.position"}')
TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/test.persons');

avatar
New Contributor

I tried using the external table method but I run out of memory. My mongo collection (table2) has 10 million records (0.755 GB) and reading from it works. After the insert task fails I do a count on the native table (table1) and it contains 0 rows.

My query looks like this: "INSERT INTO table1 SELECT * FROM table2", if I add "LIMIT 1000" it works, however I need to migrate the entire collection. I attached the output from beeline.

avatar

@Hamza FRIOUA I wrote this awhile back for a customer. The version may have changed but it should still be relevant. Essentially, it creates a test MongoDB instance, loads data, installs the storagehandler, creates a Hive table.

1. Install MongoDB: sudo yum install mongodb-org You may need to setup the following mongodb.repo file in /etc/yum.repos.d [mongodb] name=MongoDB Repository baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/ gpgcheck=0 enabled=1

2. Start mongodb: sudo service mongod start

3. Enter the mongo CLI by typing mongo

4. http://docs.mongodb.org/manual/tutorial/generate-test-data/ Type the following to add test data to db.testData. MongoDB will implicitly create the database if it isn’t already created. The default is “25” records but this can be increased if needed: for (var i = 1; i <= 25; i++) {db.testData.insert( { x : i } )}

5. To display the data type: db.testData.find()

6. http://docs.mongodb.org/ecosystem/tutorial/getting-started-with-hadoop/

7. From /root, download the mongo-hadoop git repo: git clone https://github.com/mongodb/mongo-hadoop.git

8. Navigate to /root/mongo-hadoop and type ./gradlew jar

9. Place .jar files in usr\lib\hadoop\lib and usr\lib\hive\lb mongo-hadoop-core-1.4.0-SNAPSHOT.jar mongo-hadoop-hive-1.4.0-SNAPSHOT.jar mongo-hadoop-pig-1.4.0-SNAPSHOT.jar

10. Type hive on the command line to start the Hive shell

****Create Hive Table*****

CREATE EXTERNAL TABLE testdb ( id STRING, x INT )

STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'

WITH SERDEPROPERTIES('mongo.columns.mapping' = '{"id":"_id", "x":"x"}') TBLPROPERTIES('mongo.uri'='mongodb://127.0.0.1:27017/db.testData');

***********WARNING: If you leave out the EXTERNAL command, Hive will use the MongoDB collection as the primary source. Dropping the Hive table will remove the collection from Mongo. ***********

11. You should now be able to see your MongoDB data by typing “SELECT * FROM testdb;"

Hope it helps!

avatar
Contributor

@Scott Shaw

I tried your example but I don't find the table in hdfs :

http://localhost:50070/explorer.html#/user/hive/warehouse/testdb

even I removed external ...???

,

@Scott Shaw I tried your example but I don't find the Table testdb in hdfs . Even when I removed external...???

,

@Scot Shaw

I tested your example but I did not find any results in hdfs (

http://localhost:50070/explorer.html#/user/hive/warehouse/testdb

)?? Even when i removed External

avatar
Master Mentor

@HENI MAHER please open this as a new question and describe your problem in full.

avatar
Master Mentor

avatar
Contributor

my question is relatd with Scott answer