I have been using a sand box for a while now I need to take my test to another level. Ihave 5 dell PowerEdge 2850 with each having no less tha 8GB of RAM 2 and 4 cores. I intend to do a manual installation of the HDP and layout my file system eg /u01,/u02,/u03,/u04,/u05 each patition of about 200 GB. Where can I download all the components of the HDP 2.3.2 .gz otherthan the RPM's that will allow me to do what I want because the HDP, HDP Utils all install the packages in /var etc or is there I way I could tell the install to install for example all under /u01/hadoop/pig,oozie,hdfs,yarn
this is useful but I'd like to point out that this is not yet supported by Hortonworks.
@Artem Ervits thanks for that point so what is the standard file system layout for HDP. I intend to do my Certification in 3 months and I need grasp the production configuration as compared to the sandbox
@Geoffrey Shelton Okot the packages are installed in various locations /var/run/hadoop, /etc/hadoop/, /var/log/hadoop, etc. The Pig, Sqoop, HBase, etc jars are installed in /usr/hdp/2.x/. Then hdp-select tool creates symlink and client tools will be symlinked to /usr/hdp/current/pig-client, hbase-client etc and server tools installed in /usr/hdp/current/hbase-server, /usr/hdp/current/hive-server etc. Sandbox is not a good representation for what production should be, Sandbox is designed to be a single node generic cluster. It was designed to be shutdown and not maintain recovery state.
@Artem Ervits thats exactly what I realised in my current cluster installation .Typically when installing a Linux server /var ,/etc and /usr are for system related packages. I am from an Oracle App DBA background and I do install my EBS in a separate File system like /u01 through /u06.
I have done a couple of standalones using the same Oracle related architecture but that was only while testing Hadoop 2.x HDFS,YARN,MRV2,Pig etc but I wanted to replicate an HDP installation including all the packages using Ambari. I have tried but the installation all use up /usr, /var and /etc which is not what I want.
Using Ambari I successfully create my cluster and host registration and confirmation is successful in the customize Service I change the NamedNode and DataNode directories but the install doesn't succeed 100% on all nodes and the ambari installs create too many users oozie,Tez,Acculmulo etc I intend to use only one hadoop user and group in the cluster for simplicity's sake.
this is an old thread, unless you came up with your own solution, Ambari as of 220.127.116.11 still does not support customizing directories for configurations, jars and scripts. As of Ambari 18.104.22.168 we do support customizing PID and LOG directories for Ambari, I know this is for Ambari only but it's a small step. Let's close this thread shall we?
refer for HDP directories recommendations
refer for LOG and PID dir recommenations