Created on 08-03-2020 05:41 PM - edited 08-06-2020 09:32 AM
Cloudera Data Platform DC doesn't have one Quickstart/Sandbox VM like the ones for CDH/HDP releases that helped a lot of people (including me), to learn more about the open-source components and also see the improvements from the community in CDP Runtime.
The objective of this tutorial is to enable and create a VM from scratch via some automation (Shell Script and Cloudera Template) that can help whoever wants to use and/or learn Cloudera CDP in a Sandbox/Quickstart like environment in your machine.
This exercise is performed on a Mac OS but you can install Vagrant/Virtualbox on Windows/Linux machines (https://www.vagrantup.com/docs/installation).
The versions below were tested at the moment of writing this blog and may change in the future.
The machine needs to have at least:
These are the software that we'll use to run our virtualized environment and to download and install Virtualbox and Vagrant execute the following commands in your host machine (For MAC OS):
For Mac
$ brew cask install virtualbox $ brew cask install vagrant $ brew cask install vagrant-manager |
The manager is optional and can be used to manage your Virtual Machines on the menu bar.
For Windows
Download Virtualbox here and Vagrant here and install the files. Also, take a look at this instruction regarding hypervisor.
For Linux
Follow Virtualbox and Vagrant instructions to install in your Linux Version.
$ cd ~
$ mkdir cdpvm
$ cd cdpvm
$ wget https://cloud.centos.org/centos/7/vagrant/x86_64/images/CentOS-7-x86_64-Vagrant-2004_01.VirtualBox.box
$ wget https://raw.githubusercontent.com/carrossoni/CDPDCTrial/master/scripts/VMSetup.sh
$ vagrant box add CentOS-7-x86_64-Vagrant-2004_01.VirtualBox.box --name centos7
$ vagrant plugin install vagrant-disksize
$ vagrant init centos7
config.vm.box = "centos7"
config.vm.network "private_network", ip: "192.168.10.23"
config.vm.network "public_network"
config.vm.network :forwarded_port, guest: 7180, host: 7180
config.vm.hostname = "cloudera"
config.disksize.size = "100GB"
config.vm.provision "shell", path: "VMSetup.sh"
config.vm.provider "virtualbox" do |vb|
# Display the VirtualBox GUI when booting the machine
vb.gui = true
# Customize the amount of memory on the VM:
vb.memory = "10024"
vb.cpus = "8"
end
$ vagrant up
HUE
HDFS
Impala
Ranger
Zookeeper
http://localhost:7180
We've our environment ready to work and learn more about CDP!
You should see the masked policy in action!
In this blog we've learned:
You can play with the services, install other parcels like Kafka/Nifi/Kudu to create a streaming ingestion pipeline, and query in real-time with Spark/Impala. Of course for that, you'll need more resources and this can be changed in the beginning during the VM Configuration.
Created on 08-06-2020 02:27 AM - edited 08-06-2020 06:19 AM
Just tried to follow instructions and received the error below when I run Vagrant Up command. Any help is much appreciated.
default: 'result_data_url': 'http://cloudera:7180/cmf/command/12/download',
default: 'result_message': 'Failed to complete installation.',
default: 'role_ref': None,
default: 'service_ref': None,
default: 'start_time': '2020-08-06T09:22:58.606Z',
default: 'success': False}
default: Traceback (most recent call last):
default: File "/root/CDPDCTrial/scripts/create_cluster.py", line 76, in <module>
default: mgmt_api.auto_assign_roles() # needed?
default: File "/usr/lib/python2.7/site-packages/cm_client/apis/mgmt_service_resource_api.py", line 65, in auto_assign_roles
default: (data) = self.auto_assign_roles_with_http_info(**kwargs)
default: File "/usr/lib/python2.7/site-packages/cm_client/apis/mgmt_service_resource_api.py", line 131, in auto_assign_roles_with_http_info
default: collection_formats=collection_formats)
default: File "/usr/lib/python2.7/site-packages/cm_client/api_client.py", line 326, in call_api
default: _return_http_data_only, collection_formats, _preload_content, _request_timeout)
default: File "/usr/lib/python2.7/site-packages/cm_client/api_client.py", line 153, in __call_api
default: _request_timeout=_request_timeout)
default: File "/usr/lib/python2.7/site-packages/cm_client/api_client.py", line 379, in request
default: body=body)
default: File "/usr/lib/python2.7/site-packages/cm_client/rest.py", line 273, in PUT
default: body=body)
default: File "/usr/lib/python2.7/site-packages/cm_client/rest.py", line 219, in request
default: raise ApiException(http_resp=r)
default: cm_client.rest.ApiException
default: : (400)
default: Reason: Bad Request
default: HTTP response headers: HTTPHeaderDict({'X-XSS-Protection': '1; mode=block', 'X-Content-Type-Options': 'nosniff', 'Transfer-Encoding': 'chunked', 'Set-Cookie': 'SESSION=9d9bab89-f03e-4fd6-aba7-8eaa64a38128;Path=/;HttpOnly', 'Expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache, no-store, max-age=0, must-revalidate', 'Date': 'Thu, 06 Aug 2020 09:24:24 GMT', 'X-Frame-Options': 'DENY', 'Content-Type': 'application/json;charset=utf-8'})
default: HTTP response body: {
default: "message" : "Deployment should contain hosts."
default: }
default: usermod: group 'hadoop' does not exist
default: sudo: unknown user: hdfs
default: sudo: unable to initialize policy plugin
default: sudo: unknown user: hdfs
default: sudo: unable to initialize policy plugin
default: sudo: unknown user: hdfs
default: sudo: unable to initialize policy plugin
default: sudo: unknown user: hdfs
default: sudo: unable to initialize policy plugin
default: sudo: unknown user: hdfs
default: sudo: unable to initialize policy plugin
As it has failed to run create_cluster.py, I have ssh to vargant box and tried and received the below error message
[root@cloudera ~]# python ~/CDPDCTrial/scripts/create_cluster.py ~/CDPDCTrial/conf/cdpsandbox.json
{'active': False,
'can_retry': True,
'children': {'items': []},
'cluster_ref': None,
'end_time': '2020-08-06T11:10:10.014Z',
'host_ref': None,
'id': 16.0,
'name': 'GlobalHostInstall',
'parent': None,
'result_data_url': 'http://cloudera:7180/cmf/command/16/download',
'result_message': 'Failed to complete installation.',
'role_ref': None,
'service_ref': None,
'start_time': '2020-08-06T11:08:49.716Z',
'success': False}
Traceback (most recent call last):
File "/root/CDPDCTrial/scripts/create_cluster.py", line 76, in <module>
mgmt_api.auto_assign_roles() # needed?
File "/usr/lib/python2.7/site-packages/cm_client/apis/mgmt_service_resource_api.py", line 65, in auto_assign_roles
(data) = self.auto_assign_roles_with_http_info(**kwargs)
File "/usr/lib/python2.7/site-packages/cm_client/apis/mgmt_service_resource_api.py", line 131, in auto_assign_roles_with_http_info
collection_formats=collection_formats)
File "/usr/lib/python2.7/site-packages/cm_client/api_client.py", line 326, in call_api
_return_http_data_only, collection_formats, _preload_content, _request_timeout)
File "/usr/lib/python2.7/site-packages/cm_client/api_client.py", line 153, in __call_api
_request_timeout=_request_timeout)
File "/usr/lib/python2.7/site-packages/cm_client/api_client.py", line 379, in request
body=body)
File "/usr/lib/python2.7/site-packages/cm_client/rest.py", line 273, in PUT
body=body)
File "/usr/lib/python2.7/site-packages/cm_client/rest.py", line 219, in request
raise ApiException(http_resp=r)
cm_client.rest.ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'X-XSS-Protection': '1; mode=block', 'X-Content-Type-Options': 'nosniff', 'Transfer-Encoding': 'chunked', 'Set-Cookie': 'SESSION=3677d4cf-2406-4989-86c9-f6f896700889;Path=/;HttpOnly', 'Expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache, no-store, max-age=0, must-revalidate', 'Date': 'Thu, 06 Aug 2020 11:10:10 GMT', 'X-Frame-Options': 'DENY', 'Content-Type': 'application/json;charset=utf-8'})
HTTP response body: {
"message" : "Deployment should contain hosts."
}
Checked the file /root/CDPDCTrial/scripts/create_cluster.py and I can confirm that the hostname has changed to cloudera as below
instargs = cm_client.ApiHostInstallArguments(
host_names=['cloudera'],
user_name='root',
private_key=key,
cm_repo_url='https://archive.cloudera.com/cm7/7.1.1/',
java_install_strategy='NONE',
ssh_port=22,
passphrase='')
Created on 08-07-2020 06:17 AM
Hi Akhil,
It seems that the download wasn't completed and the host isn't in a health state can you login in http://localhost:7180 via admin/admin?
If yes you can check in hosts whats the status of the host?
Thanks,
Luiz
Created on 08-08-2020 04:09 AM - edited 08-08-2020 04:13 AM
@carrossoni , when I open Cloudera Manager, it is taking me to the installation page. That means it has not registered the hosts yet.
The error is also confirming that "
Deployment should contain hosts.
It must be something to deal with the below function.
mgmt_api.auto_assign_roles() # needed?
If I comment that and run below via "vagrant ssh",
python ~/CDPDCTrial/scripts/create_cluster.py ~/CDPDCTrial/conf/cdpsandbox.json
{'active': False,
'can_retry': True,
'children': {'items': []},
'cluster_ref': None,
'end_time': '2020-08-08T08:06:09.975Z',
'host_ref': None,
'id': 16.0,
'name': 'GlobalHostInstall',
'parent': None,
'result_data_url': 'http://cloudera:7180/cmf/command/16/download',
'result_message': 'Failed to complete installation.',
'role_ref': None,
'service_ref': None,
'start_time': '2020-08-08T08:04:44.671Z',
'success': False}
Traceback (most recent call last):
File "/root/CDPDCTrial/scripts/create_cluster.py", line 78, in <module>
mgmt_api.setup_cms(body=api_service)
File "/usr/lib/python2.7/site-packages/cm_client/apis/mgmt_service_resource_api.py", line 1013, in setup_cms
(data) = self.setup_cms_with_http_info(**kwargs)
File "/usr/lib/python2.7/site-packages/cm_client/apis/mgmt_service_resource_api.py", line 1091, in setup_cms_with_http_info
collection_formats=collection_formats)
File "/usr/lib/python2.7/site-packages/cm_client/api_client.py", line 326, in call_api
_return_http_data_only, collection_formats, _preload_content, _request_timeout)
File "/usr/lib/python2.7/site-packages/cm_client/api_client.py", line 153, in __call_api
_request_timeout=_request_timeout)
File "/usr/lib/python2.7/site-packages/cm_client/api_client.py", line 379, in request
body=body)
File "/usr/lib/python2.7/site-packages/cm_client/rest.py", line 273, in PUT
body=body)
File "/usr/lib/python2.7/site-packages/cm_client/rest.py", line 219, in request
raise ApiException(http_resp=r)
cm_client.rest.ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'X-XSS-Protection': '1; mode=block', 'X-Content-Type-Options': 'nosniff', 'Transfer-Encoding': 'chunked', 'Set-Cookie': 'SESSION=b8021e6e-91e8-4b18-895e-8cd1f3a287a0;Path=/;HttpOnly', 'Expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache, no-store, max-age=0, must-revalidate', 'Date': 'Sat, 08 Aug 2020 08:06:10 GMT', 'X-Frame-Options': 'DENY', 'Content-Type': 'application/json;charset=utf-8'})
HTTP response body: {
"message" : "Cannot find a suitable default host for cluster 'null'."
}
Created on 08-08-2020 09:14 AM
Hi @AkhilTech ,
The primary problem is that the host isn't in health state, this is evaluated before provisioning the cluster. This can happen if there isn't enough resources to provision CDP.
If there's enough resources you can:
Check if the service cloudera-scm-agent is configured properly:
- Check if the file /etc/cloudera-scm-agent/config.ini has the lines server_host= and listening_ip= with the same results of the commands "hostname" and "host cloudera".
- Check if the cloudera-scm-agent process is running "sudo service cloudera-scm-agent status".
- Restart the cloudera-scm-agent whit "sudo service cloudera-scm-agent restart".
After this you can check the logs for error messages inside the VM:
- /var/log/cloudera-scm-server/cloudera-scm-server.log
- /var/log/cloudera-scm-agent/cloudera-scm-agent.log
Specially in cloudera-scm-agent log look for the last lines that should contain the error messages on why your host isn't healthy.
You can also use Cloudera Manager to identify what's happening and restart the agent services, if you click in the top left Cloudera logo you should see the initial page and you can go to "Hosts" to see what's happening.
Click on the top left Cloudera Manager Icon
Click in Hosts --> All Hosts
Click on host cloudera, if everything is health should appear something like this:
If not first try to restart the Cloudera Management Service (Actions -> Restart), and wait to see the results. If there's errors you can follow the messages to see the logs and what's may be causing the errors. If it goes up you can run the provisioning python again.
Let me know if this helps,
Thanks,
Luiz
Created on 08-09-2020 08:15 AM
@carrossoni , Thanks for the inputs. After checking the file /etc/cloudera-scm-agent/config.ini
it has
server_host=10.0.2.15
listening_ip=found:
I have changed the values as below.
server_host=cloudera
listening_ip=10.0.2.15
and restarted the cloudera-scm-agent "sudo service cloudera-scm-agent restart"
After almost 90 minutes wait, the parcels download is completed and I can see the services up and running on CM.
Thanks for your kind support.
Created on 02-14-2021 11:56 PM
Hi @carrossoni
I followed the tutorial and I was able to start the virtual machine.
1. I enter into the folder created cdpvm
2. I ran 'vagrant up'
3. I entered the username and password (I googled for those. I did not know what to put)
3.1. username: vagrant
3.2. password: vagrant
4. It seems I got in. Please see the screen below
I do not know what else to do to. Would you please help me with the step 2.
Thanks in advance
Regards
User | Count |
---|---|
758 | |
379 | |
316 | |
309 | |
268 |