Created on 09-09-2016 10:12 AM - edited 09-16-2022 03:38 AM
A prod. cluster is already in place - HDP-2.4.2.0-258 installed using Ambari 2.2.2.0.
Following are the existing and upcoming scenarios :
I have a few thoughts :
Can the community help me to assess the viability of this requirement/suggest alternatives ?
Created 09-09-2016 01:56 PM
This sounds to me like you would want to have customized sandboxes that have
Is my understanding correct?
If so, provisioning the VMs as vagrant boxes might be an option. Vagrant boxes can be provisioned and spun up very quickly. Also, due to the layered nature of Vagrant, you can add diverse data sets on top of the vanilla HDP sandboxes.
An excellent description how to create a Vagrant box out of a HDP sandbox image is here. This is for a Mac but the instructions on a Windows machine should in principle be similar.
Created 09-09-2016 01:56 PM
This sounds to me like you would want to have customized sandboxes that have
Is my understanding correct?
If so, provisioning the VMs as vagrant boxes might be an option. Vagrant boxes can be provisioned and spun up very quickly. Also, due to the layered nature of Vagrant, you can add diverse data sets on top of the vanilla HDP sandboxes.
An excellent description how to create a Vagrant box out of a HDP sandbox image is here. This is for a Mac but the instructions on a Windows machine should in principle be similar.
Created 09-09-2016 02:46 PM
Yes, that's correct
The 'built-in' data set will be different/customized for each VM spawned(as it will be used by different roles)
The tutorial seems informative but I have a question - can Vagrant connect to the prod. cluster WITHOUT MAJOR changes to the prod. machines and spawn VMs as required with custom data sets ? Apologies if it sounds stupid but I'm unable to visualize how Vagrant will work with the prod. cluster
Created 09-10-2016 02:03 PM
Vagrant provides a VM in that is run by the provisioner of your choice, for instance, VirtualBox or VMWare. The network configuration of your VM determines whether you can connect to the network outside. Typically, in your example you would use one of two configurations:
If you want to "bake" your data sets into your Vagrant boxes, this can all be scripted. In order to always get the recent version of the data set, you might want to create a Vagrant box, based on a plain sandbox, that just goes out to the production system and fetches its data as it is spun up the first time. Because the Vagrant box acts as a client using standard APIs, generally speaking I believe you would not have to change your production systems. To give you a precise answers I would need to know your case in more detail, though.