I am going to work on a project that need hortonworks and mongoDB with Spark/scala.
After searched all over the Internet, I put these steps together and not sure if they are correct, hope a guru jumps up to help me with my plan.
this is a rough plan I have for now,
1. install multiple nodes of hortonworks, ex. 10 nodes(2 namenodes, 5 datanodes and 3 edge nodes)
2. Install mongoDB on one of the edge node as mongodb server.
3. download a hug json/bson file from one of the websites and import to mongodb for testing
4.. download mongodb-hadoop connector from github.com and install it on mongodb server, please give an example of using this connector.
5. from Spark/scala, call mongoDB by using freemarkerstyle, mongoclient and mongodatabase etc.
6. After all steps here cleared, the kafka + flume or flafka will kick in for real time and streaming...
Correct me if I am wrong here, please share your experiences with me.
Thank you so much.
First of all, I would recommend, that when you describe the solution you have in mind and ask for opinions here, you should also describe what you are trying to accomplish, what are your 'requirements'. Without understanding that task at your hand, it is impossible to advise on the any proposed solution.
I would also recommend you consult this doc (please disregard if you have already done that): https://docs.mongodb.com/ecosystem/tutorial/getting-started-with-hadoop/ and then come back here with a more specific question.
Yes, you are right.
Basically, this is a hortonworks + mongodb with json datafile. environment.
I just found that you can install mongodb on a edge node(may be on ambari machine) and add service to ambari stake. It will work.