Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to import data from MongoDB to Hive or Hbase ?

Solved Go to solution
Highlighted

How to import data from MongoDB to Hive or Hbase ?

Hi All,

I would like to know how I can import data from MongoDB (documents) to Hive or Hbase ?

Best Regards

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: How to import data from MongoDB to Hive or Hbase ?

@Hamza FRIOUA

Best option would be using Mongo hadoop connector with hive external tables but you need to built that jar manually or use prebuilt.

https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage

CREATE TABLE individuals
( 
  id INT,
  name STRING,
  age INT,
  work STRUCT<title:STRING, hours:INT>
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id","work.title":"job.position"}')
TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/test.persons');

View solution in original post

7 REPLIES 7
Highlighted

Re: How to import data from MongoDB to Hive or Hbase ?

@Hamza FRIOUA

Best option would be using Mongo hadoop connector with hive external tables but you need to built that jar manually or use prebuilt.

https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage

CREATE TABLE individuals
( 
  id INT,
  name STRING,
  age INT,
  work STRUCT<title:STRING, hours:INT>
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id","work.title":"job.position"}')
TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/test.persons');

View solution in original post

Highlighted

Re: How to import data from MongoDB to Hive or Hbase ?

New Contributor

I tried using the external table method but I run out of memory. My mongo collection (table2) has 10 million records (0.755 GB) and reading from it works. After the insert task fails I do a count on the native table (table1) and it contains 0 rows.

My query looks like this: "INSERT INTO table1 SELECT * FROM table2", if I add "LIMIT 1000" it works, however I need to migrate the entire collection. I attached the output from beeline.

Highlighted

Re: How to import data from MongoDB to Hive or Hbase ?

@Hamza FRIOUA I wrote this awhile back for a customer. The version may have changed but it should still be relevant. Essentially, it creates a test MongoDB instance, loads data, installs the storagehandler, creates a Hive table.

1. Install MongoDB: sudo yum install mongodb-org You may need to setup the following mongodb.repo file in /etc/yum.repos.d [mongodb] name=MongoDB Repository baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/ gpgcheck=0 enabled=1

2. Start mongodb: sudo service mongod start

3. Enter the mongo CLI by typing mongo

4. http://docs.mongodb.org/manual/tutorial/generate-test-data/ Type the following to add test data to db.testData. MongoDB will implicitly create the database if it isn’t already created. The default is “25” records but this can be increased if needed: for (var i = 1; i <= 25; i++) {db.testData.insert( { x : i } )}

5. To display the data type: db.testData.find()

6. http://docs.mongodb.org/ecosystem/tutorial/getting-started-with-hadoop/

7. From /root, download the mongo-hadoop git repo: git clone https://github.com/mongodb/mongo-hadoop.git

8. Navigate to /root/mongo-hadoop and type ./gradlew jar

9. Place .jar files in usr\lib\hadoop\lib and usr\lib\hive\lb mongo-hadoop-core-1.4.0-SNAPSHOT.jar mongo-hadoop-hive-1.4.0-SNAPSHOT.jar mongo-hadoop-pig-1.4.0-SNAPSHOT.jar

10. Type hive on the command line to start the Hive shell

****Create Hive Table*****

CREATE EXTERNAL TABLE testdb ( id STRING, x INT )

STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'

WITH SERDEPROPERTIES('mongo.columns.mapping' = '{"id":"_id", "x":"x"}') TBLPROPERTIES('mongo.uri'='mongodb://127.0.0.1:27017/db.testData');

***********WARNING: If you leave out the EXTERNAL command, Hive will use the MongoDB collection as the primary source. Dropping the Hive table will remove the collection from Mongo. ***********

11. You should now be able to see your MongoDB data by typing “SELECT * FROM testdb;"

Hope it helps!

Highlighted

Re: How to import data from MongoDB to Hive or Hbase ?

Explorer

@Scott Shaw

I tried your example but I don't find the table in hdfs :

http://localhost:50070/explorer.html#/user/hive/warehouse/testdb

even I removed external ...???

,

@Scott Shaw I tried your example but I don't find the Table testdb in hdfs . Even when I removed external...???

,

@Scot Shaw

I tested your example but I did not find any results in hdfs (

http://localhost:50070/explorer.html#/user/hive/warehouse/testdb

)?? Even when i removed External

Re: How to import data from MongoDB to Hive or Hbase ?

Mentor

@HENI MAHER please open this as a new question and describe your problem in full.

Highlighted

Re: How to import data from MongoDB to Hive or Hbase ?

Mentor
Highlighted

Re: How to import data from MongoDB to Hive or Hbase ?

Explorer

my question is relatd with Scott answer

Don't have an account?
Coming from Hortonworks? Activate your account here