Created on 06-08-2016 03:38 PM - edited 09-16-2022 03:24 AM
Hi All,
I would like to know how I can import data from MongoDB (documents) to Hive or Hbase ?
Best Regards
Created 06-08-2016 03:43 PM
Best option would be using Mongo hadoop connector with hive external tables but you need to built that jar manually or use prebuilt.
https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
CREATE TABLE individuals ( id INT, name STRING, age INT, work STRUCT<title:STRING, hours:INT> ) STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler' WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id","work.title":"job.position"}') TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/test.persons');
Created 06-08-2016 03:43 PM
Best option would be using Mongo hadoop connector with hive external tables but you need to built that jar manually or use prebuilt.
https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
CREATE TABLE individuals ( id INT, name STRING, age INT, work STRUCT<title:STRING, hours:INT> ) STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler' WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id","work.title":"job.position"}') TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/test.persons');
Created 10-17-2017 02:36 PM
I tried using the external table method but I run out of memory. My mongo collection (table2) has 10 million records (0.755 GB) and reading from it works. After the insert task fails I do a count on the native table (table1) and it contains 0 rows.
My query looks like this: "INSERT INTO table1 SELECT * FROM table2", if I add "LIMIT 1000" it works, however I need to migrate the entire collection. I attached the output from beeline.
Created 06-08-2016 05:25 PM
@Hamza FRIOUA I wrote this awhile back for a customer. The version may have changed but it should still be relevant. Essentially, it creates a test MongoDB instance, loads data, installs the storagehandler, creates a Hive table.
1. Install MongoDB: sudo yum install mongodb-org You may need to setup the following mongodb.repo file in /etc/yum.repos.d [mongodb] name=MongoDB Repository baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/ gpgcheck=0 enabled=1
2. Start mongodb: sudo service mongod start
3. Enter the mongo CLI by typing mongo
4. http://docs.mongodb.org/manual/tutorial/generate-test-data/ Type the following to add test data to db.testData. MongoDB will implicitly create the database if it isn’t already created. The default is “25” records but this can be increased if needed: for (var i = 1; i <= 25; i++) {db.testData.insert( { x : i } )}
5. To display the data type: db.testData.find()
6. http://docs.mongodb.org/ecosystem/tutorial/getting-started-with-hadoop/
7. From /root, download the mongo-hadoop git repo: git clone https://github.com/mongodb/mongo-hadoop.git
8. Navigate to /root/mongo-hadoop and type ./gradlew jar
9. Place .jar files in usr\lib\hadoop\lib and usr\lib\hive\lb mongo-hadoop-core-1.4.0-SNAPSHOT.jar mongo-hadoop-hive-1.4.0-SNAPSHOT.jar mongo-hadoop-pig-1.4.0-SNAPSHOT.jar
10. Type hive on the command line to start the Hive shell
****Create Hive Table*****
CREATE EXTERNAL TABLE testdb ( id STRING, x INT )
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping' = '{"id":"_id", "x":"x"}') TBLPROPERTIES('mongo.uri'='mongodb://127.0.0.1:27017/db.testData');
***********WARNING: If you leave out the EXTERNAL command, Hive will use the MongoDB collection as the primary source. Dropping the Hive table will remove the collection from Mongo. ***********
11. You should now be able to see your MongoDB data by typing “SELECT * FROM testdb;"
Hope it helps!
Created 05-03-2017 03:56 PM
I tried your example but I don't find the table in hdfs :
http://localhost:50070/explorer.html#/user/hive/warehouse/testdb
even I removed external ...???
,@Scott Shaw I tried your example but I don't find the Table testdb in hdfs . Even when I removed external...???
,@Scot Shaw
I tested your example but I did not find any results in hdfs (
http://localhost:50070/explorer.html#/user/hive/warehouse/testdb
)?? Even when i removed External
Created 05-03-2017 05:10 PM
@HENI MAHER please open this as a new question and describe your problem in full.
Created 06-08-2016 05:25 PM
I wrote a short tutorial on doing just that https://community.hortonworks.com/content/repo/4538/hdp-mongo-tutorial.html
Created 05-03-2017 05:43 PM
my question is relatd with Scott answer