Created on 08-03-2020 05:41 PM - edited 05-06-2021 08:50 AM
Cloudera Data Platform DC doesn't have one Quickstart/Sandbox VM like the ones for CDH/HDP releases that helped a lot of people (including me), to learn more about the open-source components and also see the improvements from the community in CDP Runtime.
The objective of this tutorial is to enable and create a VM from scratch via some automation (Shell Script and Cloudera Template) that can help whoever wants to use and/or learn Cloudera CDP in a Sandbox/Quickstart like environment in your machine.
This exercise is performed on a Mac OS but you can install Vagrant/Virtualbox on Windows/Linux machines (https://www.vagrantup.com/docs/installation).
The versions below were tested at the moment of writing this blog and may change in the future.
The machine needs to have at least:
$ brew cask install virtualbox
$ brew cask install vagrant
$ brew cask install vagrant-manager
The manager is optional and can be used to manage your Virtual Machines on the menu bar.
$ cd ~ $ mkdir cdpvm $ cd cdpvm $ wget https://cloud.centos.org/centos/7/vagrant/x86_64/images/CentOS-7-x86_64-Vagrant-2004_01.VirtualBox.box $ wget https://raw.githubusercontent.com/carrossoni/CDPDCTrial/master/scripts/VMSetup.sh
$ vagrant box add CentOS-7-x86_64-Vagrant-2004_01.VirtualBox.box --name centos7 $ vagrant plugin install vagrant-disksize $ vagrant init centos7
config.vm.box = "centos7" config.vm.network "private_network", ip: "192.168.10.23" config.vm.network "public_network" config.vm.network :forwarded_port, guest: 7180, host: 7180 config.vm.hostname = "cloudera" config.disksize.size = "100GB" config.vm.provision "shell", path: "VMSetup.sh"
config.vm.provider "virtualbox" do |vb| # Display the VirtualBox GUI when booting the machine vb.gui = true # Customize the amount of memory on the VM: vb.memory = "10024" vb.cpus = "8" end
$ vagrant up
HUE HDFS Impala Ranger Zookeeper
$ vagrant ssh
Now you can sudo the box and start looking the machine, try to see if the hostname and ip in /etc/hosts is configured properly (most common issue since depends of your machine network).
We've our environment ready to work and learn more about CDP!
You should see the masked policy in action!
In this blog we've learned:
You can play with the services, install other parcels like Kafka/Nifi/Kudu to create a streaming ingestion pipeline, and query in real-time with Spark/Impala. Of course for that, you'll need more resources and this can be changed in the beginning during the VM Configuration.