Created on 08-03-2020 05:41 PM - edited 01-15-2022 05:59 AM
Cloudera Data Platform Base doesn't have one Quickstart/Sandbox VM like the ones for CDH/HDP releases that helped a lot of people (including me), to learn more about the open-source components and also see the improvements from the community in CDP Runtime.
The objective of this tutorial is to enable and create a VM from scratch via some automation (Shell Script and Cloudera Template) that can help whoever wants to use and/or learn Cloudera CDP in a Sandbox/Quickstart like environment in your machine.
This exercise is performed on a Mac OS but you can install Vagrant/Virtualbox on Windows/Linux machines (https://www.vagrantup.com/docs/installation).
The versions below were tested at the moment of writing this blog and may change in the future.
The machine needs to have at least:
$ brew cask install virtualbox
$ brew cask install vagrant
$ brew cask install vagrant-manager
The manager is optional and can be used to manage your Virtual Machines on the menu bar.
$ cd ~ $ mkdir cdpvm $ cd cdpvm $ wget https://cloud.centos.org/centos/7/vagrant/x86_64/images/CentOS-7-x86_64-Vagrant-2004_01.VirtualBox.box $ wget https://raw.githubusercontent.com/carrossoni/CDPDCTrial/master/scripts/VMSetup.sh
$ vagrant box add CentOS-7-x86_64-Vagrant-2004_01.VirtualBox.box --name centos7 $ vagrant plugin install vagrant-disksize $ vagrant init centos7
config.vm.network "public_network" config.vm.network :forwarded_port, guest: 7180, host: 7180 config.vm.network :forwarded_port, guest: 8889, host: 8889 config.vm.network :forwarded_port, guest: 9870, host: 9870 config.vm.network :forwarded_port, guest: 6080, host: 6080 config.vm.network :forwarded_port, guest: 21050, host: 21050 config.vm.hostname = "localhost" config.disksize.size = "80GB" config.vm.provision "shell", path: "VMSetup.sh" config.vm.provider "virtualbox" do |vb| # Display the VirtualBox GUI when booting the machine vb.gui = true # Customize the amount of memory on the VM: vb.memory = "12024" vb.cpus = "8" end
$ vagrant up
HUE HDFS Hive Metastore Impala Ranger Zookeeper
$ vagrant ssh
After ssh using both scenarios you can sudo the box and start looking the machine, try to see if the hostname and ip in /etc/hosts is configured properly (most common issue since depends of your machine network).
If after the template import you have an error message, cloudera manager can show what's happening, work in the error and then resume the import cluster template process in the running commands tab. If you are in this step now normally is a matter to view logs and/or see if there isn't resources available, at the end you can restart the cluster to see if it's something that was stuck. This is normal since we are working in a constrained environment.
We've our environment ready to work and learn more about CDP!
You should see the masked policy in action!
In this blog we've learned:
You can play with the services, install other parcels like Kafka/Nifi/Kudu to create a streaming ingestion pipeline, and query in real-time with Spark/Impala. Of course for that, you'll need more resources and this can be changed in the beginning during the VM Configuration.