Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Expert Contributor

Often times, maintaining a hadoop cluster for longer periods on cloud is an expensive task. Sometimes engineers might also encounter situations of not having immediate access to cloud environment, in order to quickly spin up their own cluster and play arround. As an easy alternative, vagrant with virtual box as a provider, HDP cluster can be set up on your own laptop.

Step 1 - Install prerequisites

Step 2 - Generation of Vagrantfile

Create a working directory for Vagrant file generation and initiating the deployment vis vagrant

$ mkdir hdp22
$ cd hdp22

Following command will generate the Vagrantfile in current directory. This file will define VMs that are to be on the cluster.

$ vagrant init 
$ vi Vagrantfile

Step 3 - Configuration of VMs in Vagrantfile

Lets configure Vagrant to use CentOS 6.6/ CentOS 6.7 as the base box

$ config.vm.box = "bento/centos-6.7" or 
$ config.vm.box = "chef/centos-6.6"

The below script should be included in the Vagrantfile to allow some basic provisioning for VMs like

1. Install NTP service

2. Disable firewall, SElinux

3.(Optional) Install wget

$script = <<SCRIPT
sudo yum -y install ntp
sudo chkconfig ntpd on
sudo chkconfig iptables off
sudo /etc/init.d/iptables stop
sudo setenforce 0
sudo sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
sudo sh -c 'echo "* soft nofile 10000" >> /etc/security/limits.conf'
sudo sh -c 'echo "* hard nofile 10000" >> /etc/security/limits.conf'
sudo sh -c 'echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag'
sudo sh -c 'echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled'
SCRIPT
config.vm.provision "shell", inline:
 $script

Step 4 - Configure the definition of VMs

The following configurations will define 4 Virtual machines to be used in the HDP cluster,

1 Ambari server

1 Hadoop master

2 slaves

The machines defined have below hostnames:

1. ambari1.mycluster

2. master1.mycluster

3. slave1.mycluster

4. slave2.mycluster

# Ambari1
config.vm.define :ambari1 do |a1|
  a1.vm.hostname = "ambari1.mycluster"
  a1.vm.network :private_network, ip: "192.168.0.11"
  a1.vm.provider :virtualbox do |vb|
    vb.memory = "2048"
  end

  a1.vm.network "forwarded_port", guest: 8080, host: 8080
  a1.vm.network "forwarded_port", guest: 80, host: 80
end

# Master1
config.vm.define :master1 do |m1|
  m1.vm.hostname = "master1.mycluster"
  m1.vm.network :private_network, ip: "192.168.0.12"
  m1.vm.provider :virtualbox do |vb|
    vb.memory = "4096"
  end
end

# Slave1
config.vm.define :slave1 do |s1|
  s1.vm.hostname = "slave1.mycluster"
  s1.vm.network :private_network, ip: "192.168.0.21"
  s1.vm.provider :virtualbox do |vb|
    vb.memory = "2048"
  end
end

# Slave2
config.vm.define :slave2 do |s2|
  s2.vm.hostname = "slave2.mycluster"
  s2.vm.network :private_network, ip: "192.168.0.22"
  s2.vm.provider :virtualbox do |vb|
    vb.memory = "2048"
  end
end

Step 5 - Start the Machines and Install Ambari Server

Vagrant will automatically run the provision defined in Vagrantfile by Shell Provisioner to start the Ambari server machine from Vagrant. And then SSH to the Ambari server

$ vagrant up ambari1
$ vagrant ssh ambari1

4966-vagrant1.png

As a root user, run the below commands

# Install
wget -nv http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.2.2.0/ambari.repo -O /etc/yum.repos.d/ambari.repo
yum -y install ambari-server
sudo service ntpd start

# Setup. There are several options to configure during setup.
ambari-server setup

# Start Ambari Server
ambari-server start

Add the following FQDN to each the /etc/hosts file on each VM.

192.168.0.11 ambari1.mycluster ambari1
192.168.0.12 master1.mycluster master1
192.168.0.21 slave1.mycluster slave1
192.168.0.22 slave2.mycluster slave2

Set up a password less SSHing from Ambari Node to all other nodes(VMs)

$ ssh-keygen
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Copy the ambari server's public key to other nodes' authorized keys to allow communication later.

Step 6 - Deploy HDP Cluster

We are ready to deploy a HDP cluster from Ambari Web UI. Because the UI is really simple, I would omit the screenshots here.

  • Access http://192.168.0.11:8080/ from your laptop. The username and password is admin and admin respectively.
  • Give a cluster name.
  • Select the latest HDP version
  • Input hostname of the VMs (one per line) and the SSH private key of Ambari server. SSH user should bevagrant.

4969-vagrant-hdp-cluster-config.png

  • Accept the default options for rest of the wizard.
  • Complete the wizard. It takes about 30m to finish up.

Now we are all set and a 4 nodes HDP cluster ready on your local machine!

5,659 Views
Comments
avatar
Super Guru

Step 5 installs Ambari 1.7 which is an older version. You should use this step to get the latest version (Ambari 2.2.20):

wget -nv http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.2.2.0/ambari.repo -O /etc/yum.repos.d/ambari.repo
avatar
Super Guru

In step 3, the script in the VagrantFile could include:

sudo service ntpd start

The chkconfig command will ensure ntpd starts on bootup. However, I found ntpd did not auto start the first time the instance was brought up with Vagrant. Subsequent boots of the VM work properly.

avatar
Expert Contributor

Thanks for letting me know this.