We prepared 3 VM 16 GB RAM each (Ubuntu 16 - 64bit) and we need to deploy HDF 3.1.1 Docker image to them to test HDF cluster.
1. Should we install Docker on all the three VMs then deploy HDF 3.1.1 Docker image to all the 3 VMs then after that we create NiFI cluster through AMBARI.
2. Do we need to make one VM as Master HDF node and the other two VMs slave Nodes for the HDF cluster.
3. Can we install MySQL Cluster on 2 VMs out of those 3 VMs to store our flows data.
4. For production deployment, can we relay on HDF docker image or we need to deploy HDF from scratch by installing HDF packages.
I would recommend you deploy the cluster directly onto your 3 machines (VMs). You shouldn't have to do anything related to Docker to setup an HDF cluster using Ambari. The step-by-step guide for how to set it up is here:
One of the things you will need is to be able to connect to the internet from your nodes so that you can download Ambari and then setup the rest of the services on the cluster. Once you get Ambari set up, the rest is a click-through interface and UI-based.
For your 3-node cluster, you will need to designate one node as both a 'management' node as well as a 'worker' node. That node will run Ambari as well as any other master/management services you end up installing as a part of HDF (such as Storm Nimbus).
Yes you can install MySQL alongside HDF on several of those nodes if you want a MySQL HA configuration, but be aware that this may impact performance. Services like NiFi and Kafka require dedicated disks for maximum throughput.
For a production deployment, bare metal (rather than VMs or Docker) is recommended for highest performance so yes you will have to install HDF from scratch (using the instructions linked above).