Support Questions

nicolas_steinme · ‎09-01-2016

Hi,

To avoid the standalone mode of Spark and use ambari to monitor my spark jobs, I was wondering if I could setup a HDP cluster with only ambari + spark + yarn without other components (or as little as possible) to avoid having too many nodes for just profiting of ambari/spark integration through yarn.

Thanks,

Nicolas

lraheja · ‎09-02-2016

@Nicolas Steinmetz I just tested your usecase in my environment and below are the components that would be needed before you move forward:

1. HDFS

2. YARN

3 Zookeeper

4. MR

5. Hive

6. Pig Client - You can remove this after the installation is done

7. Slider client - You can remove this after the installation is done

8. Tez Client

9. It will give you a Warning for SmartSense and Ambari Metrics but you can by pass that.

10 . Spark

Note - I tested this with HDP 2.5 and Ambari 2.4.0.1

Please find the attached screenshot for reference.

untitled.png untitled-1.png untitled-2.png untitled-3.png untitled-4.png

View solution in original post

lraheja · ‎09-01-2016

@Nicolas Steinmetz

I believe you would need HDFS, MR and Zookeeper in addition to Yarn and Spark. Ambari will not let you move forward without these components

vsithannan · ‎09-02-2016

At the bare minimum, you will need the cluster to have the following components: HDFS (data storage), MR (processing), Zookeeper (distributed coordination), YARN (Resource Manager), Ambari (components deployment and monitoring) and then Spark for your processing. Ambari will not proceed to deploy without these components.

lraheja · ‎09-02-2016

@Nicolas Steinmetz I just tested your usecase in my environment and below are the components that would be needed before you move forward:

1. HDFS

2. YARN

3 Zookeeper

4. MR

5. Hive

6. Pig Client - You can remove this after the installation is done

7. Slider client - You can remove this after the installation is done

8. Tez Client

9. It will give you a Warning for SmartSense and Ambari Metrics but you can by pass that.

10 . Spark

Note - I tested this with HDP 2.5 and Ambari 2.4.0.1

Please find the attached screenshot for reference.

untitled.png untitled-1.png untitled-2.png untitled-3.png untitled-4.png

nicolas_steinme · ‎09-05-2016

Hi @lraheja

Thanks for your precised answer (and thanks other people too 🙂 )

Side questions, it does not enforce having too many machines ? I would like to have a minimum sized cluster for this need.

Thanks,

Nicolas

lraheja · ‎09-05-2016

@Nicolas Steinmetz

It would depend on your need. If dfs.replication is 3(default) - which means each block would be replicated to 3 Data Nodes then you would atleast need 3 machines and all should have Data Node on it. You can configure this value of in HDFS and you would need to have atleast those many machine. Usually one go for 5 node cluster - 1 Master Node, 3 Data Nodes and 1 Edge Node (All clients on it).

If your replication factor is 2 then you can build up a cluster with 2 Node too.

nicolas_steinme · ‎09-06-2016

Hi @lraheja

Thanks for your precision ; I'll share this with other people in the team and see if we take this option or not.

Thanks a lot

Nicolas

Cloudera Community

Support Questions

Is it possible to install only ambari yarn and spark ?