Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hadoop cluster lab environment

Solved Go to solution
Highlighted

Hadoop cluster lab environment

New Contributor

I'm going to do a bit more searching but I thought I'd start a thread just in case anybody had any tips/pointers that might be helpful for me. I am trying to provision a lab cluster environment for students to connect to. There's clearly more to consider than other server configurations and its a bit beyond my scope so I'm looking for some help :)

I could have potentially 50-80 students connecting to this per day. I would like to configure a virtual environment for this. I've realized I don't even know how many concurrent connections would be allowed to happen?

Between semesters I would need to basically remove any data and start again from scratch for the next group of students -- is there anything I should consider with regards to that requirement?

Also in the opinion of more seasoned folks, would it be better to go phys or VM ? And any suggestions on resources/mem/space I should give for either of those options?

Thanks in advance,

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Hadoop cluster lab environment

Super Guru

@Shannon Dyck

For an environment like you describe, I would recommend using Docker containers. Think how much memory/CPU each student should get. Basically setup a container and tune it so it works well for one student. Then its simply a matter of giving every student that one container to work with. At the end of the semester destroy containers. If students want, they can copy their container and take with them. You reclaim all your resources at the end of semester and make them available when the new batch comes in. Check this out

https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/DockerContainerExecutor.html

8 REPLIES 8

Re: Hadoop cluster lab environment

Super Guru

@Shannon Dyck

For an environment like you describe, I would recommend using Docker containers. Think how much memory/CPU each student should get. Basically setup a container and tune it so it works well for one student. Then its simply a matter of giving every student that one container to work with. At the end of the semester destroy containers. If students want, they can copy their container and take with them. You reclaim all your resources at the end of semester and make them available when the new batch comes in. Check this out

https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/DockerContainerExecutor.html

Re: Hadoop cluster lab environment

New Contributor

If I want them to be able to access this from anywhere am I better off with cloudbreak? (or similar) or do they ssh to the container? Thanks again.

Re: Hadoop cluster lab environment

Super Guru

@Shannon Dyck

cloudbreak is to spin up on cloud. I would think it might be more expensive for universities but may be you already have some agreements. Curious why not just have students run docker container on their own laptops? That will save a lot of resources and makes things easier for you. Otherwise, have docker containers run on lab machines and for remote access, they will have to VPN and ssh into it.

Re: Hadoop cluster lab environment

New Contributor

I currently have them using the VirtualBox sandbox option and very few people seem to be able to get the VM to work on their own without coming to me for extra help (troubleshooting 100+ laptops per semester has been time consuming and many of these folks are not technical and therefore don't have enough resources on their home machines most of the time) so I was looking for something super simple that they could connect to. I do think the docker containers sounds like the best idea. Also the sandbox VMs I have now cannot travel with them and the labs aren't open when some people need them so I need to make sure they can do homework on their own. I think the vpn and ssh is the way I will go just wanted to be sure. I appreciate your help !

Re: Hadoop cluster lab environment

Super Guru

You are very welcome. Please don't hesitate to ask any questions you might have in future. Good luck.

Re: Hadoop cluster lab environment

New Contributor

I think I'm still struggling with this a little bit due to my lack of docker knowledge. Since I last posted I have been able to get a container running with hive to test with just so I could see how it all works and I can get queries running without issues. Now I'm curious, my overall goal with this was to setup something that students could use from anywhere (lab or home). If I build a server for them, am I essentially hosting their containers (is there any issues with hosting multiple containers using the same image)? Or am I hosting images for them to pull and run from whatever machine they are on? (trying to understand server sizing requirements which is difficult as I don't think I quite understand docker, although I've picked up some reading material since I last posted as well). I realize how newbie this makes me sound and I'm sorry if this thread was filled with newb questions but I'm making a bit of progress.

Thanks again !

Re: Hadoop cluster lab environment

Super Guru

See my reply inline. I must tell you that I am not a docker expert. So I highly recommend this forum to ask your docker questions. There are bunch of good people who would be able to help you.

1. If I build a server for them, am I essentially hosting their containers (is there any issues with hosting multiple containers using the same image)?

Answer: Yes. Use same image for everyone. There shouldn't be any issue with that (at least theoretically) Remember, each student, when they start working will change the initial image depending on how they use their image.

2. Or am I hosting images for them to pull and run from whatever machine they are on? (trying to understand server sizing requirements which is difficult as I don't think I quite understand docker, although I've picked up some reading material since I last posted as well).

Answer: You can do that too but I recall you had issues with giving students their own image because then you are helping seventy students, each with a different issue on their own laptop (Quite frankly, I don't think you would be able to avoid this much even if you have your own servers - because each student working on its own image, will likely make same mistake as they do on their own laptop and ask you for help). Having students pull down the image will be the cheapest option also.

This doesn't make you sound newbie - I am not much of a docker guy either except for the concepts and features on how it works. Check that other forum I talked about.

Re: Hadoop cluster lab environment

New Contributor

To understand how the slope is used and why it is important to calculate it. As I mentioned, it tells you if a line is moved up or down and pitch it has.

Looking at the value of the slope, you can immediately tell if the line goes up or down. How to find a slope?