- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to connect MongoDB with Hadoop and Spark?
- Labels:
-
Apache Hadoop
-
Apache Spark
Created ‎12-16-2016 11:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I did a bit of research and learned about the Mongo-Hadoop project, but I am not clear whether the project is also helpful for connecting to Spark.
Created ‎12-17-2016 12:05 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The mongo-hadoop project connects Hadoop AND SPARK with MongoDB. You can download it from the releases page (https://github.com/mongodb/mongo-hadoop/releases) or build it yourself from https://github.com/mongodb/mongo-hadoop. If you decide to build it yourself, you could do it using gradlew and the following steps, then copy the jar into lib/
wget -P /tmp/ https://github.com/mongodb/mongo-hadoop/archive/r1.5.1.tar.gz mkdir mongo-hadoop tar -xvzf /tmp/r1.5.1.tar.gz -C mongo-hadoop --strip-components=1 # Now build the mongo-hadoop-spark jars cd mongo-hadoop ./gradlew jar cd .. cp mongo-hadoop/spark/build/libs/mongo-hadoop-spark-*.jar lib/
Created ‎12-17-2016 12:05 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The mongo-hadoop project connects Hadoop AND SPARK with MongoDB. You can download it from the releases page (https://github.com/mongodb/mongo-hadoop/releases) or build it yourself from https://github.com/mongodb/mongo-hadoop. If you decide to build it yourself, you could do it using gradlew and the following steps, then copy the jar into lib/
wget -P /tmp/ https://github.com/mongodb/mongo-hadoop/archive/r1.5.1.tar.gz mkdir mongo-hadoop tar -xvzf /tmp/r1.5.1.tar.gz -C mongo-hadoop --strip-components=1 # Now build the mongo-hadoop-spark jars cd mongo-hadoop ./gradlew jar cd .. cp mongo-hadoop/spark/build/libs/mongo-hadoop-spark-*.jar lib/
Created ‎12-17-2016 12:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You did not specify the use case, but be aware of some limitations on bson files: https://github.com/mongodb/mongo-hadoop/wiki/Using-.bson-Files
You may want also to connect pyspark to MongoDB. Good reference: https://www.mongodb.com/blog/post/using-mongodb-hadoop-spark-part-3-spark-example-key-takeaways
Created ‎12-17-2016 12:46 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you. I'll test it and let you know.
Created ‎05-16-2017 06:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Jane Becker,
Apart from above answer, on the spark note, I believe you can use JDBC to extract the data into DataFrame,
Spark does support jdbc driver to load or save data, and documentation can be found here
PS : I have not tested on mongoDB and hope that works as the mongoDB JDBC driver be in generic JDBC driver standerd.
