Community Articles

carrossoni · ‎08-03-2020

Introduction

Cloudera Data Platform Base doesn't have one Quickstart/Sandbox VM like the ones for CDH/HDP releases that helped a lot of people (including me), to learn more about the open-source components and also see the improvements from the community in CDP Runtime.

The objective of this tutorial is to enable and create a VM from scratch via some automation (Shell Script and Cloudera Template) that can help whoever wants to use and/or learn Cloudera CDP in a Sandbox/Quickstart like environment in your machine.

Pre-Requisites

This exercise is performed on a Mac OS but you can install Vagrant/Virtualbox on Windows/Linux machines (https://www.vagrantup.com/docs/installation).

The versions below were tested at the moment of writing this blog and may change in the future.

The machine needs to have at least:

80 GB of free disk space;
12 GB free RAM;
8 free VCPU;
Good internet connection;

Install Virtualbox and Vagrant

These are the software that we'll use to run our virtualized environment and to download and install Virtualbox and Vagrant execute the following commands in your host machine (For MAC OS):

For Mac

$ brew cask install virtualbox

$ brew cask install vagrant

$ brew cask install vagrant-manager

The manager is optional and can be used to manage your Virtual Machines on the menu bar.

For Windows

Download Virtualbox here and Vagrant here and install the files. Also, take a look at this instruction regarding hypervisor.

For Linux

Follow Virtualbox and Vagrant instructions to install in your Linux Version.

Step 1: Vagrant Centos 7 Virtual Machine Setup with CDP

Download the Centos VM and the files necessary for set up in an empty folder. In this example, I'll download within the "~/cdpvm/" folder. Also, in your host machine execute the following commands:

$ cd ~
$ mkdir cdpvm
$ cd cdpvm
$ wget https://cloud.centos.org/centos/7/vagrant/x86_64/images/CentOS-7-x86_64-Vagrant-2004_01.VirtualBox.box 
$ wget https://raw.githubusercontent.com/carrossoni/CDPDCTrial/master/scripts/VMSetup.sh

Go to the folder that you've downloaded your VM file (cd ~/cdpvm) and initialize the Virtual Machine using the following command:

$ vagrant box add CentOS-7-x86_64-Vagrant-2004_01.VirtualBox.box --name centos7
$ vagrant plugin install vagrant-disksize
$ vagrant init centos7

After this step, you should have a file called "Vagrantfile" in the same directory, open the file with an editor (vim for example) and below the line config.vm.box = "centos7" add the following:

  config.vm.network "public_network"
  config.vm.network :forwarded_port, guest: 7180, host: 7180
  config.vm.network :forwarded_port, guest: 8889, host: 8889
  config.vm.network :forwarded_port, guest: 9870, host: 9870
  config.vm.network :forwarded_port, guest: 6080, host: 6080
  config.vm.network :forwarded_port, guest: 21050, host: 21050
  config.vm.hostname = "localhost"
  config.disksize.size = "80GB"
  config.vm.provision "shell", path: "VMSetup.sh"

config.vm.provider "virtualbox" do |vb|
     # Display the VirtualBox GUI when booting the machine
     vb.gui = true
     # Customize the amount of memory on the VM:
     vb.memory = "12024"
     vb.cpus = "8"
   end

Save the file and now we can init and bring up the VM:

$ vagrant up

Now it'll ask to bridge to your public network (only for the first time) normally it's the one that you're connected on the internet, in my case is en0:
After this, the VM will be provisioned and automated CDP process will start, this will take up to one hour depending on your connection since also it'll configure the VM and also install all the components for Cloudera Manager and the Services in an automated process located in https://github.com/carrossoni/CDPDCTrial/
The template and the cluster created at the end will contain the following services:

HUE

HDFS

Hive Metastore

Impala

Ranger

Zookeeper

After the install you can add more services like Nifi, Kafka etc. depending on the number of resources that you've reserved for the VM.
After the execution you should see the exit below (this will take up about 30 min to one hour depending on your connection since it'll download all the packages and parcels necessary for provisioning CDP Runtime):
After this the VM will reboot to do a fresh start, wait around 5 minutes for the services spin up and go to the next step.
Troubleshooting:
- If the install process failed, likely it's a problem during the VM configuration if CM was installed you can try going to https://localhost:7180 directrly and finish the install process manually via Cloudera Manager UI
- To ssh there's two options, the easy one is to simple go to directory that the Vagrantfile is located (that you have used to perform the setup of the VM) and type:

$ vagrant ssh

- The other option is to configure your VM in the Virtualbox UI to attach a USB and copy the clouderakey.pem file that was created during the automation process. Then you are able to ssh the machine via "ssh -i clouderakey.pem vagrant@cloudera"

After ssh using both scenarios you can sudo the box and start looking the machine, try to see if the hostname and ip in /etc/hosts is configured properly (most common issue since depends of your machine network).

If after the template import you have an error message, cloudera manager can show what's happening, work in the error and then resume the import cluster template process in the running commands tab. If you are in this step now normally is a matter to view logs and/or see if there isn't resources available, at the end you can restart the cluster to see if it's something that was stuck. This is normal since we are working in a constrained environment.

Step 2: Cloudera Data Platform Access

After the automated process our CDP Runtime is ready (actually we've provisioned in only one step)! In your machine browser you can connect to the CM with the following URL:

http://localhost:7180

Password will be admin/admin after the first login you can choose the 60-day trial option and click in "Continue":
The Welcome page appears, click in the Cloudera Logo on the top left since we've already added a new cluster with the automated process:
At this point all the services are initiated, some errors may happen since we are working on a resource constraint environment, usually follow the logs that it'll be easy to see in Cloudera Manager what's happening, also you can suppress warning messages if it's not something critical.

We've our environment ready to work and learn more about CDP!

HUE and Data Access

You can log in in Hue from the URL http://localhost:8889/hue and for the first time we will use the user admin/admin, this will be the admin user for HUE:
For example, I'll upload data from the California COVID-19 Testing that I've downloaded to my machine.
In HUE go on the left panel and choose "Importer" → Type = File, choose /user/admin directory and then click in "Upload a file", choose your file (statewide_testing.csv) and then "Open". Now click in the file that you've uploaded and this will go to the next step:
Click in Next and HUE will infer the table name, field types etc, you can change or leave as is and click in "Submit":
At the end you should see the success of the job, close the job status window, and click in the Query button:
Now that we've hour data we can query and use Impala SQL in the data that we've uploaded!

(Optional) Ranger Security Masking with Impala Example

To start using/querying the environment with the system user/password that we've created (cloudera/cloudera) first we need to enter in Ranger we need to allow access to this user, click in the Ranger service and then in Ranger Admin WebUI:
Now we have the initial Ranger screen. Login with the user/password admin/cloudera123:
In the HADOOP SQL session click in the Hadoop SQL link. We will create a new policy to allow access to the new table but seeing the tested column in masked format with null results. For that click in the Masking tab and then Add New Policy with the following values:
Click in the Add button and now go back to the Access tab and Add New Policy Button with the following parameters:
Click in Add button and now our user should be ready to select only the data on this table with the masked values. First we'll configure the user in HUE, in the left panel click in the initial button and then in "Manage Users":
Click in "Add User" and then in username put cloudera with the password cloudera, you can skip step 2 and 3 clicking directly in Add user.
Logout from HUE and login with our new create user, go to the query editor and select the data again:

You should see the masked policy in action!

Summary

In this blog we've learned:

How to Setup a Vagrant Centos 7 machine with Virtualbox and CDP Packages
Configure CDP-DC for the first run
Configure data access
Setup simple security policies with the masking feature

You can play with the services, install other parcels like Kafka/Nifi/Kudu to create a streaming ingestion pipeline, and query in real-time with Spark/Impala. Of course for that, you'll need more resources and this can be changed in the beginning during the VM Configuration.

AkhilTech · ‎08-06-2020

Just tried to follow instructions and received the error below when I run Vagrant Up command. Any help is much appreciated.

default:  'result_data_url': 'http://cloudera:7180/cmf/command/12/download',
    default:  'result_message': 'Failed to complete installation.',
    default:  'role_ref': None,
    default:  'service_ref': None,
    default:  'start_time': '2020-08-06T09:22:58.606Z',
    default:  'success': False}
    default: Traceback (most recent call last):
    default:   File "/root/CDPDCTrial/scripts/create_cluster.py", line 76, in <module>
    default:     mgmt_api.auto_assign_roles() # needed?
    default:   File "/usr/lib/python2.7/site-packages/cm_client/apis/mgmt_service_resource_api.py", line 65, in auto_assign_roles
    default:     (data) = self.auto_assign_roles_with_http_info(**kwargs)
    default:   File "/usr/lib/python2.7/site-packages/cm_client/apis/mgmt_service_resource_api.py", line 131, in auto_assign_roles_with_http_info
    default:     collection_formats=collection_formats)
    default:   File "/usr/lib/python2.7/site-packages/cm_client/api_client.py", line 326, in call_api
    default:     _return_http_data_only, collection_formats, _preload_content, _request_timeout)
    default:   File "/usr/lib/python2.7/site-packages/cm_client/api_client.py", line 153, in __call_api
    default:     _request_timeout=_request_timeout)
    default:   File "/usr/lib/python2.7/site-packages/cm_client/api_client.py", line 379, in request
    default:     body=body)
    default:   File "/usr/lib/python2.7/site-packages/cm_client/rest.py", line 273, in PUT
    default:     body=body)
    default:   File "/usr/lib/python2.7/site-packages/cm_client/rest.py", line 219, in request
    default:     raise ApiException(http_resp=r)
    default: cm_client.rest.ApiException
    default: : (400)
    default: Reason: Bad Request
    default: HTTP response headers: HTTPHeaderDict({'X-XSS-Protection': '1; mode=block', 'X-Content-Type-Options': 'nosniff', 'Transfer-Encoding': 'chunked', 'Set-Cookie': 'SESSION=9d9bab89-f03e-4fd6-aba7-8eaa64a38128;Path=/;HttpOnly', 'Expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache, no-store, max-age=0, must-revalidate', 'Date': 'Thu, 06 Aug 2020 09:24:24 GMT', 'X-Frame-Options': 'DENY', 'Content-Type': 'application/json;charset=utf-8'})
    default: HTTP response body: {
    default:   "message" : "Deployment should contain hosts."
    default: }
    default: usermod: group 'hadoop' does not exist
    default: sudo: unknown user: hdfs
    default: sudo: unable to initialize policy plugin
    default: sudo: unknown user: hdfs
    default: sudo: unable to initialize policy plugin
    default: sudo: unknown user: hdfs
    default: sudo: unable to initialize policy plugin
    default: sudo: unknown user: hdfs
    default: sudo: unable to initialize policy plugin
    default: sudo: unknown user: hdfs
    default: sudo: unable to initialize policy plugin

As it has failed to run create_cluster.py, I have ssh to vargant box and tried and received the below error message

[root@cloudera ~]# python ~/CDPDCTrial/scripts/create_cluster.py ~/CDPDCTrial/conf/cdpsandbox.json

{'active': False,
 'can_retry': True,
 'children': {'items': []},
 'cluster_ref': None,
 'end_time': '2020-08-06T11:10:10.014Z',
 'host_ref': None,
 'id': 16.0,
 'name': 'GlobalHostInstall',
 'parent': None,
 'result_data_url': 'http://cloudera:7180/cmf/command/16/download',
 'result_message': 'Failed to complete installation.',
 'role_ref': None,
 'service_ref': None,
 'start_time': '2020-08-06T11:08:49.716Z',
 'success': False}
Traceback (most recent call last):
  File "/root/CDPDCTrial/scripts/create_cluster.py", line 76, in <module>
    mgmt_api.auto_assign_roles() # needed?
  File "/usr/lib/python2.7/site-packages/cm_client/apis/mgmt_service_resource_api.py", line 65, in auto_assign_roles
    (data) = self.auto_assign_roles_with_http_info(**kwargs)
  File "/usr/lib/python2.7/site-packages/cm_client/apis/mgmt_service_resource_api.py", line 131, in auto_assign_roles_with_http_info
    collection_formats=collection_formats)
  File "/usr/lib/python2.7/site-packages/cm_client/api_client.py", line 326, in call_api
    _return_http_data_only, collection_formats, _preload_content, _request_timeout)
  File "/usr/lib/python2.7/site-packages/cm_client/api_client.py", line 153, in __call_api
    _request_timeout=_request_timeout)
  File "/usr/lib/python2.7/site-packages/cm_client/api_client.py", line 379, in request
    body=body)
  File "/usr/lib/python2.7/site-packages/cm_client/rest.py", line 273, in PUT
    body=body)
  File "/usr/lib/python2.7/site-packages/cm_client/rest.py", line 219, in request
    raise ApiException(http_resp=r)
cm_client.rest.ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'X-XSS-Protection': '1; mode=block', 'X-Content-Type-Options': 'nosniff', 'Transfer-Encoding': 'chunked', 'Set-Cookie': 'SESSION=3677d4cf-2406-4989-86c9-f6f896700889;Path=/;HttpOnly', 'Expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache, no-store, max-age=0, must-revalidate', 'Date': 'Thu, 06 Aug 2020 11:10:10 GMT', 'X-Frame-Options': 'DENY', 'Content-Type': 'application/json;charset=utf-8'})
HTTP response body: {
  "message" : "Deployment should contain hosts."
}

Checked the file /root/CDPDCTrial/scripts/create_cluster.py and I can confirm that the hostname has changed to cloudera as below

instargs = cm_client.ApiHostInstallArguments(
    host_names=['cloudera'],
    user_name='root',
    private_key=key,
    cm_repo_url='https://archive.cloudera.com/cm7/7.1.1/',
    java_install_strategy='NONE',
    ssh_port=22,
    passphrase='')

carrossoni · ‎08-07-2020

Hi Akhil,

It seems that the download wasn't completed and the host isn't in a health state can you login in http://localhost:7180 via admin/admin?

If yes you can check in hosts whats the status of the host?

Thanks,

Luiz

AkhilTech · ‎08-08-2020

@carrossoni , when I open Cloudera Manager, it is taking me to the installation page. That means it has not registered the hosts yet.

The error is also confirming that "

Deployment should contain hosts.

It must be something to deal with the below function.

mgmt_api.auto_assign_roles() # needed?

If I comment that and run below via "vagrant ssh",

python ~/CDPDCTrial/scripts/create_cluster.py ~/CDPDCTrial/conf/cdpsandbox.json

{'active': False,
 'can_retry': True,
 'children': {'items': []},
 'cluster_ref': None,
 'end_time': '2020-08-08T08:06:09.975Z',
 'host_ref': None,
 'id': 16.0,
 'name': 'GlobalHostInstall',
 'parent': None,
 'result_data_url': 'http://cloudera:7180/cmf/command/16/download',
 'result_message': 'Failed to complete installation.',
 'role_ref': None,
 'service_ref': None,
 'start_time': '2020-08-08T08:04:44.671Z',
 'success': False}
Traceback (most recent call last):
  File "/root/CDPDCTrial/scripts/create_cluster.py", line 78, in <module>
    mgmt_api.setup_cms(body=api_service)
  File "/usr/lib/python2.7/site-packages/cm_client/apis/mgmt_service_resource_api.py", line 1013, in setup_cms
    (data) = self.setup_cms_with_http_info(**kwargs)
  File "/usr/lib/python2.7/site-packages/cm_client/apis/mgmt_service_resource_api.py", line 1091, in setup_cms_with_http_info
    collection_formats=collection_formats)
  File "/usr/lib/python2.7/site-packages/cm_client/api_client.py", line 326, in call_api
    _return_http_data_only, collection_formats, _preload_content, _request_timeout)
  File "/usr/lib/python2.7/site-packages/cm_client/api_client.py", line 153, in __call_api
    _request_timeout=_request_timeout)
  File "/usr/lib/python2.7/site-packages/cm_client/api_client.py", line 379, in request
    body=body)
  File "/usr/lib/python2.7/site-packages/cm_client/rest.py", line 273, in PUT
    body=body)
  File "/usr/lib/python2.7/site-packages/cm_client/rest.py", line 219, in request
    raise ApiException(http_resp=r)
cm_client.rest.ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'X-XSS-Protection': '1; mode=block', 'X-Content-Type-Options': 'nosniff', 'Transfer-Encoding': 'chunked', 'Set-Cookie': 'SESSION=b8021e6e-91e8-4b18-895e-8cd1f3a287a0;Path=/;HttpOnly', 'Expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache, no-store, max-age=0, must-revalidate', 'Date': 'Sat, 08 Aug 2020 08:06:10 GMT', 'X-Frame-Options': 'DENY', 'Content-Type': 'application/json;charset=utf-8'})
HTTP response body: {
  "message" : "Cannot find a suitable default host for cluster 'null'."
}

carrossoni · ‎08-08-2020

Hi @AkhilTech ,

The primary problem is that the host isn't in health state, this is evaluated before provisioning the cluster. This can happen if there isn't enough resources to provision CDP.

If there's enough resources you can:

Check if the service cloudera-scm-agent is configured properly:

- Check if the file /etc/cloudera-scm-agent/config.ini has the lines server_host= and listening_ip= with the same results of the commands "hostname" and "host cloudera".

- Check if the cloudera-scm-agent process is running "sudo service cloudera-scm-agent status".

- Restart the cloudera-scm-agent whit "sudo service cloudera-scm-agent restart".

After this you can check the logs for error messages inside the VM:

- /var/log/cloudera-scm-server/cloudera-scm-server.log

- /var/log/cloudera-scm-agent/cloudera-scm-agent.log

Specially in cloudera-scm-agent log look for the last lines that should contain the error messages on why your host isn't healthy.

You can also use Cloudera Manager to identify what's happening and restart the agent services, if you click in the top left Cloudera logo you should see the initial page and you can go to "Hosts" to see what's happening.

Click on the top left Cloudera Manager Icon

Click in Hosts --> All Hosts

Click on host cloudera, if everything is health should appear something like this:

If not first try to restart the Cloudera Management Service (Actions -> Restart), and wait to see the results. If there's errors you can follow the messages to see the logs and what's may be causing the errors. If it goes up you can run the provisioning python again.

Let me know if this helps,

Thanks,

Luiz

AkhilTech · ‎08-09-2020

@carrossoni , Thanks for the inputs. After checking the file /etc/cloudera-scm-agent/config.ini

it has

server_host=10.0.2.15

listening_ip=found:

I have changed the values as below.

server_host=cloudera

listening_ip=10.0.2.15

and restarted the cloudera-scm-agent "sudo service cloudera-scm-agent restart"

After almost 90 minutes wait, the parcels download is completed and I can see the services up and running on CM.

Thanks for your kind support.

MiamiDataEng · ‎02-14-2021

Hi @carrossoni

I followed the tutorial and I was able to start the virtual machine.

1. I enter into the folder created cdpvm

2. I ran 'vagrant up'

3. I entered the username and password (I googled for those. I did not know what to put)

3.1. username: vagrant

3.2. password: vagrant

4. It seems I got in. Please see the screen below

I do not know what else to do to. Would you please help me with the step 2.

Thanks in advance

Regards

carrossoni · ‎05-06-2021

Hi @MiamiDataEng ,

To access via ssh you just need to enter via vagrant comand:

$ vagrant ssh

There's no need to put username and password.

If CM was installed correctly you'll be able to access in your broswer via URL

http://localhost:7180

User/password initially is admin/admin

Thanks,

Luiz

Cisco94 · ‎06-28-2021

Hi @carrossoni , do you need a CDP license to follow the installation? I know that You can try the CDP Private Cloud Base Edition of Cloudera Data Platform for 60 days without obtaini...

Will it stop working after that period?

Many thanks for your awesome post,

Cisco

carrossoni · ‎06-28-2021

Hi @Cisco94, Thanks! No you don't need it since the process indeed use the trial repository. After the 60 day trial it won't be possible to access Cloudera Manager/Manage the cluster but the services/data will continue there. Since it's intended for learning purposes you can quickly spin up a new trial VM again.

duhizjame · ‎07-28-2021

Hi @carrossoni, I have been having the same errors as @AkhilTech, but even before that, during the script execution I don't have the local parcels on the VM:

    default: -- Install CSDs
    default: mv: cannot stat ‘/root/*.jar’: No such file or directory
    default: mv: cannot stat ‘/home/centos/*.jar’: No such file or directory
    default: chown: cannot access ‘/opt/cloudera/csd/*’: No such file or directory
    default: chmod: cannot access ‘/opt/cloudera/csd/*’: No such file or directory
    default: -- Install local parcels
    default: mv: cannot stat ‘/root/*.parcel’: No such file or directory
    default: mv: cannot stat ‘/root/*.parcel.sha’: No such file or directory
    default: mv: cannot stat ‘/home/centos/*.parcel’: No such file or directory
    default: mv: cannot stat ‘/home/centos/*.parcel.sha’: No such file or directory
    default: chown: cannot access ‘/opt/cloudera/parcel-repo/*’: No such file or directory

after that the it is just waiting forever for the CM to boot. After I restarted the cloudera agent using ssh, this is the output I get.

    default: -- Now CM is started and the next step is to automate using the CM API
    default: ./centosvmCDP.sh: line 178: [: v42: unary operator expected
    default: DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support pip 21.0 will remove support for this functionality.
    default: Requirement already up-to-date: pip in /usr/lib/python2.7/site-packages (20.3.4)
    default: Requirement already up-to-date: cm_client in /usr/lib/python2.7/site-packages (41.0.1)
    default: Requirement already satisfied, skipping upgrade: urllib3>=1.15 in /usr/lib/python2.7/site-packages (from cm_client) (1.26.6)
    default: Requirement already satisfied, skipping upgrade: certifi in /usr/lib/python2.7/site-packages (from cm_client) (2021.5.30)
    default: Requirement already satisfied, skipping upgrade: six>=1.10 in /usr/lib/python2.7/site-packages (from cm_client) (1.16.0)
    default: Requirement already satisfied, skipping upgrade: python-dateutil in /usr/lib/python2.7/site-packages (from cm_client) (2.8.2)
    default: {'active': True,
    default:  'can_retry': False,
    default:  'children': {'items': []},
    default:  'cluster_ref': None,
    default:  'end_time': None,
    default:  'host_ref': None,
    default:  'id': 13.0,
    default:  'name': 'GlobalHostInstall',
    default:  'parent': None,
    default:  'result_data_url': None,
    default:  'result_message': None,
    default:  'role_ref': None,
    default:  'service_ref': None,
    default:  'start_time': '2021-07-28T09:50:10.176Z',
    default:  'success': None}
    default: {'active': False,
    default:  'can_retry': True,
    default:  'children': {'items': []},
    default:  'cluster_ref': None,
    default:  'end_time': '2021-07-28T09:50:16.181Z',
    default:  'host_ref': None,
    default:  'id': 13.0,
    default:  'name': 'GlobalHostInstall',
    default:  'parent': None,
    default:  'result_data_url': 'http://cloudera:7180/cmf/command/13/download',
    default:  'result_message': 'Failed to complete installation.',
    default:  'role_ref': None,
    default:  'service_ref': None,
    default:  'start_time': '2021-07-28T09:50:10.176Z',
    default:  'success': False}
    default: Traceback (most recent call last):
    default:   File "/root/CDPDCTrial/scripts/create_cluster.py", line 76, in <module>
    default:     mgmt_api.auto_assign_roles() # needed?
    default:   File "/usr/lib/python2.7/site-packages/cm_client/apis/mgmt_service_resource_api.py", line 65, in auto_assign_roles
    default:     (data) = self.auto_assign_roles_with_http_info(**kwargs)
    default:   File "/usr/lib/python2.7/site-packages/cm_client/apis/mgmt_service_resource_api.py", line 131, in auto_assign_roles_with_http_info
    default:     collection_formats=collection_formats)
    default:   File "/usr/lib/python2.7/site-packages/cm_client/api_client.py", line 326, in call_api
    default:     _return_http_data_only, collection_formats, _preload_content, _request_timeout)
    default:   File "/usr/lib/python2.7/site-packages/cm_client/api_client.py", line 153, in __call_api
    default:     _request_timeout=_request_timeout)
    default:   File "/usr/lib/python2.7/site-packages/cm_client/api_client.py", line 379, in request
    default:     body=body)
    default:   File "/usr/lib/python2.7/site-packages/cm_client/rest.py", line 273, in PUT
    default:     body=body)
    default:   File "/usr/lib/python2.7/site-packages/cm_client/rest.py", line 219, in request
    default:     raise ApiException(http_resp=r)
    default: cm_client.rest.ApiException: (400)
    default: Reason: Bad Request
    default: HTTP response headers: HTTPHeaderDict({'X-XSS-Protection': '1; mode=block', 'X-Content-Type-Options': 'nosniff', 'Transfer-Encoding': 'chunked', 'Set-Cookie': 'SESSION=ceeaa4a0-3cc5-4c44-b792-73cda7fbe71b;Path=/;HttpOnly', 'Expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache, no-store, max-age=0, must-revalidate', 'Date': 'Wed, 28 Jul 2021 09:50:17 GMT', 'X-Frame-Options': 'DENY', 'Content-Type': 'application/json;charset=utf-8'})
    default: HTTP response body: {
    default:   "message" : "Deployment should contain hosts."
    default: }
    default: 
    default: usermod: group 'hadoop' does not exist
    default: sudo: unknown user: hdfs
    default: sudo: unable to initialize policy plugin
    default: sudo: unknown user: hdfs
    default: sudo: unable to initialize policy plugin
    default: sudo: unknown user: hdfs
    default: sudo: unable to initialize policy plugin
    default: sudo: unknown user: hdfs
    default: sudo: unable to initialize policy plugin
    default: sudo: unknown user: hdfs
    default: sudo: unable to initialize policy plugin

This is the latest agent log output:

[28/Jul/2021 09:52:03 +0000] 1625 MainThread agent        INFO     To override these variables, use /etc/cloudera-scm-agent/config.ini. Environment variables for CDH locations are not used when CDH is installed from parcels.
[28/Jul/2021 09:52:05 +0000] 1625 MainThread supervisor   INFO     Trying to connect to supervisor (Attempt 1)
[28/Jul/2021 09:52:05 +0000] 1625 MainThread supervisor   INFO     Supervisor version: 3.4.0, pid: 795
[28/Jul/2021 09:52:05 +0000] 1625 MainThread supervisor   INFO     Successfully connected to supervisor
[28/Jul/2021 09:52:05 +0000] 1625 MainThread agent        INFO     Supervisor version: 3.4.0, pid: 795
[28/Jul/2021 09:52:05 +0000] 1625 MainThread agent        INFO     Connecting to previous supervisor: agent-795-1627465844.
[28/Jul/2021 09:52:05 +0000] 1625 MainThread supervisor   INFO     Triggering supervisord update.
[28/Jul/2021 09:52:05 +0000] 1625 MainThread _cplogging   INFO     [28/Jul/2021:09:52:05] ENGINE Bus STARTING
[28/Jul/2021 09:52:05 +0000] 1625 MainThread _cplogging   INFO     [28/Jul/2021:09:52:05] ENGINE Started monitor thread '_TimeoutMonitor'.
[28/Jul/2021 09:52:06 +0000] 1625 MainThread _cplogging   INFO     [28/Jul/2021:09:52:06] ENGINE Serving on http://127.0.0.1:9001
[28/Jul/2021 09:52:06 +0000] 1625 MainThread _cplogging   INFO     [28/Jul/2021:09:52:06] ENGINE Bus STARTED
[28/Jul/2021 09:52:06 +0000] 1625 MainThread status_server INFO     Status server url is http://cloudera:9000/
[28/Jul/2021 09:52:07 +0000] 1625 MainThread daemon       INFO     New monitor: (<cmf.monitor.host.HostMonitor object at 0x7fdcc96d0f90>,)
[28/Jul/2021 09:52:07 +0000] 1625 MonitorDaemon-Scheduler daemon       INFO     Monitor ready to report: ('HostMonitor',)
[28/Jul/2021 09:52:07 +0000] 1625 MainThread agent        INFO     Setting default socket timeout to 45
[28/Jul/2021 09:52:07 +0000] 1625 MainThread agent        INFO     Failed to read available parcel file: [Errno 2] No such file or directory: '/var/lib/cloudera-scm-agent/active_parcels.json'
[28/Jul/2021 09:52:07 +0000] 1625 MainThread agent        INFO     Loading last saved hb response to complete initialization: /var/lib/cloudera-scm-agent/response.avro
[28/Jul/2021 09:52:08 +0000] 1625 MainThread heartbeat_tracker INFO     HB stats (seconds): num:1 LIFE_MIN:0.06 min:0.06 mean:0.06 max:0.06 LIFE_MAX:0.06

/etc/hosts:

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
# ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
# 127.0.1.1 cloudera cloudera
#  cloudera cloudera
 cloudera cloudera

'host cloudera' returns a

Host cloudera not found: 3(NXDOMAIN)

I have tried a lot of combinations with different addresses, listening_ips in the config.ini. Nothing seems to work in good health. I can find the host via CM, but the parcels wont download correctly; they go over 100% so I canceled at 120%.

PS. pip 8.1.2 had issues with installing cm_client( or any other package) so I upgraded using the get-pip.py script for 2.7.

MariaDB failed for all mirrors, so I installed it using yum localinstall and the .rpm package which I got using wget.

EDIT: After changing uncommenting the cloudera address in /etc/hosts/ the script finally runs almost to the end, but gets stuck at downloading the parcel;

The size was 7.2GB at first, but it just continued downloading till 7.4GB and now it is just stuck there.

EDIT2: The parcel is stuck at 0% distributing now after finishing the download. Using ssh I can confirm that the parcel is at /opt/cloudera/parcel-repo

Meanwhile the host is of unknown health, and the Cloudera Management Services can't be started(error communicating with server)

Cloudera Community

Community Articles

How to create a Centos7 CDP-DC Base VM for sandbox/learning purposes

Apache Hadoop

Apache Impala

Apache Ranger

Cloudera Data Platform (CDP)

Cloudera Hue

Cloudera Manager

Introduction

Pre-Requisites

Install Virtualbox and Vagrant

Step 1: Vagrant Centos 7 Virtual Machine Setup with CDP

Step 2: Cloudera Data Platform Access

HUE and Data Access

(Optional) Ranger Security Masking with Impala Example

Summary

Re: How to create a Centos7 CDP-DC Trial VM for sandbox/learning purposes

Re: How to create a Centos7 CDP-DC Trial VM for sandbox/learning purposes

Re: How to create a Centos7 CDP-DC Trial VM for sandbox/learning purposes

Re: How to create a Centos7 CDP-DC Trial VM for sandbox/learning purposes

Re: How to create a Centos7 CDP-DC Trial VM for sandbox/learning purposes

Re: How to create a Centos7 CDP-DC Trial VM for sandbox/learning purposes

Re: How to create a Centos7 CDP-DC Trial VM for sandbox/learning purposes

Re: How to create a Centos7 CDP-DC Trial VM for sandbox/learning purposes

Re: How to create a Centos7 CDP-DC Trial VM for sandbox/learning purposes

Re: How to create a Centos7 CDP-DC Trial VM for sandbox/learning purposes