Created on 08-03-2020 05:41 PM - edited 01-15-2022 05:59 AM
Cloudera Data Platform Base doesn't have one Quickstart/Sandbox VM like the ones for CDH/HDP releases that helped a lot of people (including me), to learn more about the open-source components and also see the improvements from the community in CDP Runtime.
The objective of this tutorial is to enable and create a VM from scratch via some automation (Shell Script and Cloudera Template) that can help whoever wants to use and/or learn Cloudera CDP in a Sandbox/Quickstart like environment in your machine.
This exercise is performed on a Mac OS but you can install Vagrant/Virtualbox on Windows/Linux machines (https://www.vagrantup.com/docs/installation).
The versions below were tested at the moment of writing this blog and may change in the future.
The machine needs to have at least:
These are the software that we'll use to run our virtualized environment and to download and install Virtualbox and Vagrant execute the following commands in your host machine (For MAC OS):
For Mac
$ brew cask install virtualbox $ brew cask install vagrant $ brew cask install vagrant-manager |
The manager is optional and can be used to manage your Virtual Machines on the menu bar.
For Windows
Download Virtualbox here and Vagrant here and install the files. Also, take a look at this instruction regarding hypervisor.
For Linux
Follow Virtualbox and Vagrant instructions to install in your Linux Version.
$ cd ~
$ mkdir cdpvm
$ cd cdpvm
$ wget https://cloud.centos.org/centos/7/vagrant/x86_64/images/CentOS-7-x86_64-Vagrant-2004_01.VirtualBox.box
$ wget https://raw.githubusercontent.com/carrossoni/CDPDCTrial/master/scripts/VMSetup.sh
$ vagrant box add CentOS-7-x86_64-Vagrant-2004_01.VirtualBox.box --name centos7
$ vagrant plugin install vagrant-disksize
$ vagrant init centos7
config.vm.network "public_network"
config.vm.network :forwarded_port, guest: 7180, host: 7180
config.vm.network :forwarded_port, guest: 8889, host: 8889
config.vm.network :forwarded_port, guest: 9870, host: 9870
config.vm.network :forwarded_port, guest: 6080, host: 6080
config.vm.network :forwarded_port, guest: 21050, host: 21050
config.vm.hostname = "localhost"
config.disksize.size = "80GB"
config.vm.provision "shell", path: "VMSetup.sh"
config.vm.provider "virtualbox" do |vb|
# Display the VirtualBox GUI when booting the machine
vb.gui = true
# Customize the amount of memory on the VM:
vb.memory = "12024"
vb.cpus = "8"
end
$ vagrant up
HUE
HDFS
Hive Metastore
Impala
Ranger
Zookeeper
$ vagrant ssh
After ssh using both scenarios you can sudo the box and start looking the machine, try to see if the hostname and ip in /etc/hosts is configured properly (most common issue since depends of your machine network).
If after the template import you have an error message, cloudera manager can show what's happening, work in the error and then resume the import cluster template process in the running commands tab. If you are in this step now normally is a matter to view logs and/or see if there isn't resources available, at the end you can restart the cluster to see if it's something that was stuck. This is normal since we are working in a constrained environment.
http://localhost:7180
We've our environment ready to work and learn more about CDP!
You should see the masked policy in action!
In this blog we've learned:
You can play with the services, install other parcels like Kafka/Nifi/Kudu to create a streaming ingestion pipeline, and query in real-time with Spark/Impala. Of course for that, you'll need more resources and this can be changed in the beginning during the VM Configuration.
Created on 07-29-2021 12:10 PM
Hi @duhizjame
As you may have see network issues is the most common problem since it depends of other variables.
Now after reading your process I understand that you're stuck on the download/distribute phase, normally this happense because of insufficient disk space since it needs to download all parcels and then use it to install, since the parcels are already in /opt/cloudera/parcel-repo this means that the process is ok.
Does the logs in /var/log/cloudera-scm-server show something?
Regards,
Luiz
Created on 07-30-2021 05:49 AM
I was also thinking that the network will be the problem, not disk space, since the host is unknown health all the time.
The server logs don't show anything unusual. The agent logs show that it is heartbeating on host:
[root@cloudera ~]# netstat -an | grep -e 9000 -e 9001
tcp 0 0 10.0.2.15:9000 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:9001 0.0.0.0:* LISTEN
Could it be a problem that the 9001 port is open on localhost not on cloudera host(10.0.2.15)?
I do not know where is that setting for port 9001 in the config file for the agent.
When I try to install the host manually, this is the health inspector log:
Inspect Hosts for Correctness
Validations
Inspector ran on all 1 hosts.
Individual hosts resolved their own hostnames correctly.
No errors were found while looking for conflicting init scripts.
The following errors were found while checking /etc/hosts...
View Details
In /etc/hosts on cloudera, the hostname cloudera is mapped to cloudera, whereas it should be mapped to 10.0.2.15.
All hosts resolved localhost to 127.0.0.1.
All hosts checked resolved each other's hostnames correctly and in a timely manner.
Host clocks are approximately in sync (within ten minutes).
Host time zones are consistent across the cluster.
The user hdfs is missing on the following hosts:
View Details
cloudera
The user mapred is missing on the following hosts:
View Details
The user zookeeper is missing on the following hosts:
View Details
The user oozie is missing on the following hosts:
View Details
The user hbase is missing on the following hosts:
View Details
The user hue is missing on the following hosts:
View Details
The user sqoop is missing on the following hosts:
View Details
The user impala is missing on the following hosts:
View Details
The user sentry is missing on the following hosts:
View Details
The group hdfs is missing on the following hosts:
View Details
The group mapred is missing on the following hosts:
View Details
The group zookeeper is missing on the following hosts:
View Details
The group oozie is missing on the following hosts:
View Details
The group hbase is missing on the following hosts:
View Details
The group hue is missing on the following hosts:
View Details
The group hadoop is missing on the following hosts:
View Details
The group hive is missing on the following hosts:
View Details
The group sqoop is missing on the following hosts:
View Details
The group impala is missing on the following hosts:
View Details
The group sentry is missing on the following hosts:
View Details
No conflicts detected between packages and parcels.
No kernel versions that are known to be bad are running.
No problems were found with /proc/sys/vm/swappiness on any of the hosts.
Transparent Huge Page Compaction is enabled and can cause significant performance problems. Run "echo never > /sys/kernel/mm/transparent_hugepage/defrag" and "echo never > /sys/kernel/mm/transparent_hugepage/enabled" to disable this, and then add the same command to an init script such as /etc/rc.local so it will be set on system reboot. The following hosts are affected:
View Details
cloudera
Hue Python version dependency is satisfied.
Hue Psycopg2 version for PostgreSQL is satisfied for both CDH 5 and CDH 6.
1 hosts are reporting with NONE version
All checked hosts in each cluster are running the same version of components.
All managed hosts have consistent versions of Java.
All checked Cloudera Management Daemons versions are consistent with the server.
All checked Cloudera Management Agents versions are consistent with the server.
Version Summary
Hosts that do not belong to any cluster
All Hosts
cloudera
Component Version Hosts Release Version
Supervisord 3.4.0 cloudera Unavailable Not applicable
Cloudera Manager Agent 7.1.4 cloudera 6363010.el7 Not applicable
Cloudera Manager Management Daemons 7.1.4 cloudera 6363010.el7 Not applicable
Crunch (CDH 5 only) Unavailable cloudera Unavailable Not installed or path is incorrect
flume Unavailable cloudera Unavailable Not installed or path is incorrect
Created on 02-13-2022 02:40 PM
Nice article, @carrossoni !
I know it's been a while, but just saw this for the first time 🙂
Created on 10-12-2022 11:12 PM
Nice article,
But not working any more.
Failing to find mariadb repo as mariadb 10 was archived.