Hello people. I have downloaded the Quickstart Cloudera vm , opened it with Virtualbox, and created a vm. Next step is to create a hadoop cluster with 3 vm's (1 name node and 2 data nodes). So, with bridged adapter in network configurations, and changes to /etc/hosts, i managed to make all 3 vm's communicate each one. So, all good until here. But, according to (a giude i found on internet), i have 2 questions - issues. 1) Steps at this guide make use of a path : ~/hadoop/etc/hadoop/.... . But in the Quickstart VM CentOS, there is no installed hadoop folder in ~. Instead i can find only hadoop folder in path: /etc/hadoop/.. 2) In this guide, at some point it says, that we have to configure the file ~/hadoop/etc/hadoop/workers. But, i am going to path : /etc/hadoop/, and there is no file with name workers. So, in the cloudera vm there are not these files. Is this an issue? Or Cloudera vm has a different way to create a hadoop cluster of vms? Thank you in advance!
While it is based on the open source Apache Hadoop project's sources, the Cloudera Quickstart is not freely interchangeable with the Hadoop distribution version described at the site you read that guide on.
If you want to use the QuickStart VM, the appropriate instructions can be found at docs.cloudera.com.
If you want to download and use the Hadoop tarball referred to in the guide you mentioned, you will be much more likely to get a useful answer if you post your question to the comments section of the site you read that guide on.
Ok, lets follow official guide (https://docs.cloudera.com/documentation/enterprise/5-13-x/topics/quickstart_vm_administrative_inform...). But, i cannot find any information how to make a hadoop cluster from quickstart vm's, in this guide. Where can i find relative information?
The Cloudera Quickstart is a single-host deployment of the Cloudera open-source distribution, including CDH and Cloudera Manager. The purpose of it is to enable developers and operators to learn Hadoop and related software from the Hadoop ecosystem, to try out new ideas, and to test and demonstrate their applications.
Once you understand that, it's understandable that there's no easily-accessible guide to creating a multi-node hadoop cluster using the Cloudera Quickstart on VMs. While I believe it's possible to do, it would require a lot of manipulation of the configuration and deployment choices, and if one had the expertise to pull that off, it would defeat the purpose of the Quickstart as a distribution.
Perhaps some other member of the community knows how to do this and will create an article on how to do it or post a link to such a guide in this thread.