<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Recommendation of a Multi-Node cluster in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Recommendation-of-a-Multi-Node-cluster/m-p/131788#M27250</link>
    <description>&lt;P&gt;"Should I go with HDP 2.3 or 2.4 ?" &lt;/P&gt;&lt;P&gt;I would tend to 2.4 here. Although it depends a bit what you need. An older release of 2.3 may provide some extra stability but a lot of security features ( kerberos for Kafka-&amp;gt;Spark Streaming etc. ) and a new spark release are in 2.4 ( and other goodies ). Also upgrading to a point version is normally easier than jumping releases. But again depends on your needs. &lt;/P&gt;&lt;P&gt;"Any issue with this configuration. Also do we need the client on all the machines "
&lt;/P&gt;&lt;P&gt;For most cases not ( Sqoop action with hive in oozie needs the hive clients on all nodes but that is an exception ) &lt;/P&gt;&lt;P&gt;Regarding your node distribution:&lt;/P&gt;&lt;P&gt;How many datanodes are you planning? I see 5 different node types so I assume you want 3 master nodes one edge node and 3 datanodes?&lt;/P&gt;&lt;P&gt;That may make sense if you plan to grow the cluster later but if you want to get the maximum amount of work done I would rather go for 2 master nodes and perhaps even reuse one of them as edge nodes. And have a decent amount of datanodes. Obviously depends on your server size as well but big modern servers with 12+ cores and 256GB of RAM can host an awful lot of master components at the same time without creating a bottleneck. Others may disagree with me here but I setup a 7 datanode plus 1 master+edge node cluster once ( didn't design it )  and it worked fine as long as you do not expect constant uptime for your cluster ( colocating this many services increases the chance of something going wrong and bringing down the whole cluster because of a server reboot so its nothing you would do for a mission critical system that cannot go down. If you have much smaller servers then you might need more master nodes as well though.&lt;/P&gt;</description>
    <pubDate>Thu, 05 May 2016 02:09:07 GMT</pubDate>
    <dc:creator>bleonhardi</dc:creator>
    <dc:date>2016-05-05T02:09:07Z</dc:date>
    <item>
      <title>Recommendation of a Multi-Node cluster</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Recommendation-of-a-Multi-Node-cluster/m-p/131787#M27249</link>
      <description>&lt;P&gt;Need recommendation for a small 7 Node cluster. Below is what I am planning to do:&lt;/P&gt;&lt;P&gt;MasterNode: NameNode, ResourceManager, HBase Master, Oozie Server, Zookeeper Serve&lt;/P&gt;&lt;P&gt;DataNode: DataNode, NodeManager, RegionServer&lt;/P&gt;&lt;P&gt;Web interface: Ambari server / HUE interface / Zeppelin / Ranger&lt;/P&gt;&lt;P&gt;Gateway Node: All the clients (HDFS, Hive, Spark, Pig, Mahout, Tez etc)&lt;/P&gt;&lt;P&gt;SecondaryNode: Secondary NameNode, HiveServer2, MySQL, WebHCat server, HiveMetaStore&lt;/P&gt;&lt;P&gt;Any issue with this configuration. Also do we need the client on all the machines ?&lt;/P&gt;&lt;P&gt;Should I go with HDP 2.3 or 2.4 ?&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Prakash&lt;/P&gt;</description>
      <pubDate>Thu, 05 May 2016 01:51:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Recommendation-of-a-Multi-Node-cluster/m-p/131787#M27249</guid>
      <dc:creator>prakashpunj</dc:creator>
      <dc:date>2016-05-05T01:51:32Z</dc:date>
    </item>
    <item>
      <title>Re: Recommendation of a Multi-Node cluster</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Recommendation-of-a-Multi-Node-cluster/m-p/131788#M27250</link>
      <description>&lt;P&gt;"Should I go with HDP 2.3 or 2.4 ?" &lt;/P&gt;&lt;P&gt;I would tend to 2.4 here. Although it depends a bit what you need. An older release of 2.3 may provide some extra stability but a lot of security features ( kerberos for Kafka-&amp;gt;Spark Streaming etc. ) and a new spark release are in 2.4 ( and other goodies ). Also upgrading to a point version is normally easier than jumping releases. But again depends on your needs. &lt;/P&gt;&lt;P&gt;"Any issue with this configuration. Also do we need the client on all the machines "
&lt;/P&gt;&lt;P&gt;For most cases not ( Sqoop action with hive in oozie needs the hive clients on all nodes but that is an exception ) &lt;/P&gt;&lt;P&gt;Regarding your node distribution:&lt;/P&gt;&lt;P&gt;How many datanodes are you planning? I see 5 different node types so I assume you want 3 master nodes one edge node and 3 datanodes?&lt;/P&gt;&lt;P&gt;That may make sense if you plan to grow the cluster later but if you want to get the maximum amount of work done I would rather go for 2 master nodes and perhaps even reuse one of them as edge nodes. And have a decent amount of datanodes. Obviously depends on your server size as well but big modern servers with 12+ cores and 256GB of RAM can host an awful lot of master components at the same time without creating a bottleneck. Others may disagree with me here but I setup a 7 datanode plus 1 master+edge node cluster once ( didn't design it )  and it worked fine as long as you do not expect constant uptime for your cluster ( colocating this many services increases the chance of something going wrong and bringing down the whole cluster because of a server reboot so its nothing you would do for a mission critical system that cannot go down. If you have much smaller servers then you might need more master nodes as well though.&lt;/P&gt;</description>
      <pubDate>Thu, 05 May 2016 02:09:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Recommendation-of-a-Multi-Node-cluster/m-p/131788#M27250</guid>
      <dc:creator>bleonhardi</dc:creator>
      <dc:date>2016-05-05T02:09:07Z</dc:date>
    </item>
    <item>
      <title>Re: Recommendation of a Multi-Node cluster</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Recommendation-of-a-Multi-Node-cluster/m-p/131789#M27251</link>
      <description>&lt;P&gt;Thank you so much Benjamin. We are starting with small size cluster with 2 MasterNode (32GB each), 4 DataNode and 1EdgeNode. &lt;/P&gt;</description>
      <pubDate>Thu, 05 May 2016 02:55:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Recommendation-of-a-Multi-Node-cluster/m-p/131789#M27251</guid>
      <dc:creator>prakashpunj</dc:creator>
      <dc:date>2016-05-05T02:55:32Z</dc:date>
    </item>
  </channel>
</rss>

