Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hadoop cluster lab environment

avatar

I'm going to do a bit more searching but I thought I'd start a thread just in case anybody had any tips/pointers that might be helpful for me. I am trying to provision a lab cluster environment for students to connect to. There's clearly more to consider than other server configurations and its a bit beyond my scope so I'm looking for some help 🙂

I could have potentially 50-80 students connecting to this per day. I would like to configure a virtual environment for this. I've realized I don't even know how many concurrent connections would be allowed to happen?

Between semesters I would need to basically remove any data and start again from scratch for the next group of students -- is there anything I should consider with regards to that requirement?

Also in the opinion of more seasoned folks, would it be better to go phys or VM ? And any suggestions on resources/mem/space I should give for either of those options?

Thanks in advance,

1 ACCEPTED SOLUTION

avatar
Super Guru

@Shannon Dyck

For an environment like you describe, I would recommend using Docker containers. Think how much memory/CPU each student should get. Basically setup a container and tune it so it works well for one student. Then its simply a matter of giving every student that one container to work with. At the end of the semester destroy containers. If students want, they can copy their container and take with them. You reclaim all your resources at the end of semester and make them available when the new batch comes in. Check this out

https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/DockerContainerExecutor.html

View solution in original post

8 REPLIES 8

avatar
Super Guru

@Shannon Dyck

For an environment like you describe, I would recommend using Docker containers. Think how much memory/CPU each student should get. Basically setup a container and tune it so it works well for one student. Then its simply a matter of giving every student that one container to work with. At the end of the semester destroy containers. If students want, they can copy their container and take with them. You reclaim all your resources at the end of semester and make them available when the new batch comes in. Check this out

https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/DockerContainerExecutor.html

avatar

If I want them to be able to access this from anywhere am I better off with cloudbreak? (or similar) or do they ssh to the container? Thanks again.

avatar
Super Guru

@Shannon Dyck

cloudbreak is to spin up on cloud. I would think it might be more expensive for universities but may be you already have some agreements. Curious why not just have students run docker container on their own laptops? That will save a lot of resources and makes things easier for you. Otherwise, have docker containers run on lab machines and for remote access, they will have to VPN and ssh into it.

avatar

I currently have them using the VirtualBox sandbox option and very few people seem to be able to get the VM to work on their own without coming to me for extra help (troubleshooting 100+ laptops per semester has been time consuming and many of these folks are not technical and therefore don't have enough resources on their home machines most of the time) so I was looking for something super simple that they could connect to. I do think the docker containers sounds like the best idea. Also the sandbox VMs I have now cannot travel with them and the labs aren't open when some people need them so I need to make sure they can do homework on their own. I think the vpn and ssh is the way I will go just wanted to be sure. I appreciate your help !

avatar
Super Guru

You are very welcome. Please don't hesitate to ask any questions you might have in future. Good luck.

avatar

I think I'm still struggling with this a little bit due to my lack of docker knowledge. Since I last posted I have been able to get a container running with hive to test with just so I could see how it all works and I can get queries running without issues. Now I'm curious, my overall goal with this was to setup something that students could use from anywhere (lab or home). If I build a server for them, am I essentially hosting their containers (is there any issues with hosting multiple containers using the same image)? Or am I hosting images for them to pull and run from whatever machine they are on? (trying to understand server sizing requirements which is difficult as I don't think I quite understand docker, although I've picked up some reading material since I last posted as well). I realize how newbie this makes me sound and I'm sorry if this thread was filled with newb questions but I'm making a bit of progress.

Thanks again !

avatar
Super Guru

See my reply inline. I must tell you that I am not a docker expert. So I highly recommend this forum to ask your docker questions. There are bunch of good people who would be able to help you.

1. If I build a server for them, am I essentially hosting their containers (is there any issues with hosting multiple containers using the same image)?

Answer: Yes. Use same image for everyone. There shouldn't be any issue with that (at least theoretically) Remember, each student, when they start working will change the initial image depending on how they use their image.

2. Or am I hosting images for them to pull and run from whatever machine they are on? (trying to understand server sizing requirements which is difficult as I don't think I quite understand docker, although I've picked up some reading material since I last posted as well).

Answer: You can do that too but I recall you had issues with giving students their own image because then you are helping seventy students, each with a different issue on their own laptop (Quite frankly, I don't think you would be able to avoid this much even if you have your own servers - because each student working on its own image, will likely make same mistake as they do on their own laptop and ask you for help). Having students pull down the image will be the cheapest option also.

This doesn't make you sound newbie - I am not much of a docker guy either except for the concepts and features on how it works. Check that other forum I talked about.

avatar
New Contributor

To understand how the slope is used and why it is important to calculate it. As I mentioned, it tells you if a line is moved up or down and pitch it has.

Looking at the value of the slope, you can immediately tell if the line goes up or down. How to find a slope?