Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Metron deployment fails

Explorer

I am following the Quick Platform Deployment and get the following error:

TASK [quick_dev : Delete the Metron Components from Ambari] ********************

changed: [node1] => (item=METRON_ENRICHMENT_MASTER)

changed: [node1] => (item=METRON_INDEXING)

changed: [node1] => (item=METRON_PARSERS)

TASK [quick_dev : Remove the Metron packages] **********************************

changed: [node1] => (item=[u'metron-common', u'metron-data-management', u'metron-parsers', u'metron-enrichment', u'metron-indexing', u'metron-elasticsearch'])

TASK [quick_dev : Create local repo with new packages] *************************

ok: [node1]

TASK [quick_dev : Re-install the Metron Packages via Ambari] *******************

changed: [node1] => (item=METRON_ENRICHMENT_MASTER)

changed: [node1] => (item=METRON_INDEXING)

changed: [node1] => (item=METRON_PARSERS)

TASK [quick_dev : Start the ambari cluster] ************************************

fatal: [node1]: FAILED! => {"changed": false, "failed": true, "msg": "Request failed with status FAILED"}

to retry, use: --limit @/Users/v834647/Downloads/metron-master/metron-deployment/playbooks/metron_full_install.retry

PLAY RECAP *********************************************************************

node1 : ok=62 changed=15 unreachable=0 failed=1

Ansible failed to complete successfully. Any error output should be

visible above. Please fix these errors and try again.

tca0080alkvtaoq:quick-dev-platform v834647$ sudo vagrant up

Password:

Running with ansible-tags: ["quick_dev", "report"]

The VirtualBox VM was created with a user that doesn't match the

current user running Vagrant. VirtualBox requires that the same user

be used to manage the VM that was created. Please re-run Vagrant with

that user. This is not a Vagrant issue.

The UID used to create the VM was: 200001609

Your UID is: 0

tca0080alkvtaoq:quick-dev-platform v834647$ sudo vagrant up

21 REPLIES 21

Explorer

can somebody respond please?

Contributor

It is because you are doing sudo there. you do not have to sudo to run vagrant up.

Explorer

ok, tried without sudo and it stalls here:

%vagrant up

Running with ansible-tags: ["quick_dev", "report"]

Bringing machine 'node1' up with 'virtualbox' provider...

==> node1: Checking if box 'metron/quick_dev' is up to date...

==> node1: Resuming suspended VM...

==> node1: Booting VM...

==> node1: Waiting for machine to boot. This may take a few minutes...

node1: SSH address: 127.0.0.1:2222

node1: SSH username: vagrant

node1: SSH auth method: private key

Rising Star

Don't use Quick Dev; use Full Dev as per my answer. Good luck!

Rising Star

I would highly suggest that you use the "Full Dev" environment. Using `vagrant destroy` will just delete any half-completed VM that you may have created when running into these issues. That will allow you to start fresh.

Otto is also right too; don't sudo.

cd metron-deployment/vagrant/full-dev-platform

vagrant destroy

vagrant up

Explorer

ok, tried that as well and get this error:

==> node1: Machine booted and ready!

==> node1: Checking for guest additions in VM...

node1: The guest additions on this VM do not match the installed version of

node1: VirtualBox! In most cases this is fine, but in rare cases it can

node1: prevent things such as shared folders from working properly. If you see

node1: shared folder errors, please make sure the guest additions within the

node1: virtual machine match the version of VirtualBox you have installed on

node1: your host and reload your VM.

node1:

node1: Guest Additions Version: 5.0.20

node1: VirtualBox Version: 5.1

==> node1: Setting hostname...

The following SSH command responded with a non-zero exit status.

Vagrant assumes that this means the command failed!

# Update sysconfig

sed -i 's/\(HOSTNAME=\).*/\1node1/' /etc/sysconfig/network

# Update DNS

sed -i 's/\(DHCP_HOSTNAME=\).*/\1"node1"/' /etc/sysconfig/network-scripts/ifcfg-*

# Set the hostname - use hostnamectl if available

echo 'node1' > /etc/hostname

if command -v hostnamectl; then

hostnamectl set-hostname --static 'node1'

hostnamectl set-hostname --transient 'node1'

else

hostname -F /etc/hostname

fi

# Prepend ourselves to /etc/hosts

grep -w 'node1' /etc/hosts || {

sed -i'' '1i 127.0.0.1\tnode1\tnode1' /etc/hosts

}

# Restart network

service network restart

Stdout from the command:

Shutting down interface eth0: [ OK ]

Shutting down interface eth1: [ OK ]

Shutting down loopback interface: [ OK ]

Bringing up loopback interface: [ OK ]

Bringing up interface eth0:

Determining IP information for eth0... done.

[ OK ]

Bringing up interface eth1: Determining if ip address 192.168.66.121 is already in use for device eth1...

Error, some other host (08:00:27:AE:B2:94) already uses address 192.168.66.121.

[FAILED]

Stderr from the command:

Explorer

so I brought quick-dev in virtual box and ran again. Get this error now:

%vagrant up

Running with ansible-skip-tags: ["sensors", "quick_dev"]

Bringing machine 'node1' up with 'virtualbox' provider...

==> node1: Checking if box 'metron/centos_base' is up to date...

==> node1: [vagrant-hostmanager:guests] Updating hosts file on active guest virtual machines...

==> node1: [vagrant-hostmanager:host] Updating hosts file on your workstation (password may be required)...

==> node1: Running provisioner: ansible...

node1: Running ansible-playbook...

PLAY [all] *********************************************************************

TASK [setup] *******************************************************************

fatal: [node1]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host node1 port 22: Operation timed out\r\n", "unreachable": true}

to retry, use: --limit @/Users/v834647/Downloads/metron-master/metron-deployment/playbooks/metron_full_install.retry

PLAY RECAP *********************************************************************

node1 : ok=0 changed=0 unreachable=1 failed=0

Ansible failed to complete successfully. Any error output should be

visible above. Please fix these errors and try again.

Contributor

I would suggest:

  • Go into /etc/hosts and remove any node1 entry you see
  • go into the quickdev dir and vagrant destroy
  • go into the full dev dir and vagrant destroy
  • then in full dev vagrant up

Explorer

where is "/etc/hosts/" ?

  • Go into /etc/hosts and remove any node1 entry you see

Explorer

ok, did the above and get the following:

TASK [ambari_config : check if ambari-server is up on node1:8080] **************

ok: [node1]

TASK [ambari_config : Deploy cluster with Ambari; http://node1:8080] ***********

fatal: [node1]: FAILED! => {"changed": false, "failed": true, "msg": "Request failed with status TIMEDOUT"}

to retry, use: --limit @/Users/v834647/Downloads/metron-master/metron-deployment/playbooks/metron_full_install.retry

PLAY RECAP *********************************************************************

node1 : ok=52 changed=33 unreachable=0 failed=1

Ansible failed to complete successfully. Any error output should be

visible above. Please fix these errors and try again.

Explorer

I can access http://node1:8080. Not sure why TASK fails above

Explorer

Also, what is the login for Virtual Box VM? I can now access http://node1:8080 but all services show that heartbeat is lost in the GUI

Super Collaborator

Run vagrant ssh from the quick-dev folder to SSH into the virtualbox.

From the Ambari UI, did you attempt to restart all the service?

Also, please paste the output of the following script:

metron-deployment/scripts/platform-info.sh

Explorer
screen-shot-2017-07-27-at-11500-pm.pngmost services show up after a reboot.





platform-info.txt

Contributor

@ppp rrr -> you need to RUN the platform-info.sh. We have our own copy 😉

Also -> In ambari the Alerts at the top leftish will have the actual errors from the start operation.

Explorer

I cannot sign in to http://node1:8080 anymore.

Explorer

Explorer

Hello - I restarted node1 VM and cannot connect to Ambari anymore. How do I verify that Ambari is running?

Explorer

sorry, enclosed wrong file. Now, I cannot reach Ambari anymore http://node1:8080

Explorer

so I see all the services up now. I have close to 20 snort instances firing alerts. What are the simple steps to get these into DB for further analysis by ELK and ML?