Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to connect MongoDB with Hadoop and Spark?

Solved Go to solution

How to connect MongoDB with Hadoop and Spark?

New Contributor

I did a bit of research and learned about the Mongo-Hadoop project, but I am not clear whether the project is also helpful for connecting to Spark.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: How to connect MongoDB with Hadoop and Spark?

@Jane Becker

The mongo-hadoop project connects Hadoop AND SPARK with MongoDB. You can download it from the releases page (https://github.com/mongodb/mongo-hadoop/releases) or build it yourself from https://github.com/mongodb/mongo-hadoop. If you decide to build it yourself, you could do it using gradlew and the following steps, then copy the jar into lib/

wget -P /tmp/ https://github.com/mongodb/mongo-hadoop/archive/r1.5.1.tar.gz
mkdir mongo-hadoop
tar -xvzf /tmp/r1.5.1.tar.gz -C mongo-hadoop --strip-components=1

# Now build the mongo-hadoop-spark jars
cd mongo-hadoop
./gradlew jar
cd ..
cp mongo-hadoop/spark/build/libs/mongo-hadoop-spark-*.jar lib/
4 REPLIES 4

Re: How to connect MongoDB with Hadoop and Spark?

@Jane Becker

The mongo-hadoop project connects Hadoop AND SPARK with MongoDB. You can download it from the releases page (https://github.com/mongodb/mongo-hadoop/releases) or build it yourself from https://github.com/mongodb/mongo-hadoop. If you decide to build it yourself, you could do it using gradlew and the following steps, then copy the jar into lib/

wget -P /tmp/ https://github.com/mongodb/mongo-hadoop/archive/r1.5.1.tar.gz
mkdir mongo-hadoop
tar -xvzf /tmp/r1.5.1.tar.gz -C mongo-hadoop --strip-components=1

# Now build the mongo-hadoop-spark jars
cd mongo-hadoop
./gradlew jar
cd ..
cp mongo-hadoop/spark/build/libs/mongo-hadoop-spark-*.jar lib/

Re: How to connect MongoDB with Hadoop and Spark?

You did not specify the use case, but be aware of some limitations on bson files: https://github.com/mongodb/mongo-hadoop/wiki/Using-.bson-Files

You may want also to connect pyspark to MongoDB. Good reference: https://www.mongodb.com/blog/post/using-mongodb-hadoop-spark-part-3-spark-example-key-takeaways

Re: How to connect MongoDB with Hadoop and Spark?

New Contributor

@Constantin Stanca

Thank you. I'll test it and let you know.

Re: How to connect MongoDB with Hadoop and Spark?

Super Collaborator

Hi @Jane Becker,

Apart from above answer, on the spark note, I believe you can use JDBC to extract the data into DataFrame,

Spark does support jdbc driver to load or save data, and documentation can be found here

PS : I have not tested on mongoDB and hope that works as the mongoDB JDBC driver be in generic JDBC driver standerd.