<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: When I add a new rack some Impala queries became extremely slow! in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/When-I-add-a-new-rack-some-Impala-queries-became-extremely/m-p/71510#M80533</link>
    <description>&lt;P&gt;Syncronizing data between clusters can be accomplished via distcp, &lt;A href="https://www.cloudera.com/content/dam/www/marketing/resources/training/cloudera-enterprise-bdr-overview.png.landing.html" target="_self"&gt;BDR&lt;/A&gt;, or ingesting data into both clusters simulatenously using 3rd party tools. The best tool depends on your use case, risk tolerance, and budget.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We don't recommend spanning clusters across large geographic regions (e.g. US to EU); network latency and bandwidth are usually not suitable and could easily result in the&amp;nbsp;slow query times you're experiencing.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We DO support spanning clusters across AWS Availability Zones if certain conditions are met; see Appendix A of &lt;A href="http://tiny.cloudera.com/aws-ra" target="_self"&gt;Cloudera Enterprise Reference Architecture&amp;nbsp;for AWS Deployments&lt;/A&gt;&amp;nbsp;(PDF) details. For comparison, the latency between AWS AZs is typically sub-millisecond.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Spanning bare metal clusters across multiple data centers will be addressed in the next release of &lt;A href="http://tiny.cloudera.com/metal-ra" target="_self"&gt;Cloudera Enterprise Reference Architecture for Bare Metal Deployments&lt;/A&gt;&amp;nbsp;(PDF), to coincide with C6. It will look similar to the AWS guidance, but with the additional caveat that network latency between sides should not exceed 10ms.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://kudu.apache.org/docs/known_issues.html" target="_self"&gt;Kudu does not support rack awareness&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Not all services provide HA.&lt;/P&gt;</description>
    <pubDate>Thu, 12 Jul 2018 13:16:53 GMT</pubDate>
    <dc:creator>alexm</dc:creator>
    <dc:date>2018-07-12T13:16:53Z</dc:date>
    <item>
      <title>When I add a new rack some Impala queries became extremely slow!</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/When-I-add-a-new-rack-some-Impala-queries-became-extremely/m-p/69859#M80529</link>
      <description>&lt;DIV&gt;&lt;DIV&gt;Hi,&lt;BR /&gt;&lt;BR /&gt;&lt;/DIV&gt;&lt;DIV&gt;I'm working on CDH v5.14.2/CM v5.14.1, I was having a cluster with 15 nodes in one rack (/my_cluster/rack1) in a US data center, and the execution time of queries was great (ex. 2.8 secs), when I decide to extand the cluster and takes into account the HA, I add 10 nodes in a europe datacenter and I assign them like a seconde rack (/my_cluster/rack2).&lt;BR /&gt;The problem is when I start the impala daemons of the 10 new nodes (rack2), the same queries execution time became extremely long (ex. 4.5 min).&lt;/DIV&gt;&lt;DIV&gt;&lt;BR /&gt;NB: I &lt;SPAN&gt;realize &lt;/SPAN&gt;that one rack must be faster than two separated racks, but in in my case the difference is huge (about x100)!! and what about the rack awareness in hadoop..&lt;/DIV&gt;&lt;DIV&gt;&lt;BR /&gt;Here is the profile files of the query in two cases:&lt;BR /&gt;1 rack (15nodes):&lt;BR /&gt;&lt;A href="https://files.fm/f/nczbw432" target="_self"&gt;query profile - 2.8 sec&lt;/A&gt;&lt;BR /&gt;2 racks (15+10 nodes):&lt;BR /&gt;&lt;A href="https://files.fm/f/9wmjv2nu" target="_self"&gt;query profile - 4.5 min&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Thanks in advance.&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;</description>
      <pubDate>Fri, 16 Sep 2022 13:26:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/When-I-add-a-new-rack-some-Impala-queries-became-extremely/m-p/69859#M80529</guid>
      <dc:creator>AcharkiMed</dc:creator>
      <dc:date>2022-09-16T13:26:47Z</dc:date>
    </item>
    <item>
      <title>Re: When I add a new rack some Impala queries became extremely slow!</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/When-I-add-a-new-rack-some-Impala-queries-became-extremely/m-p/69866#M80530</link>
      <description>&lt;P&gt;This sounds like a result of the drastically increased link latency between your two "racks". While within a single rack you&amp;nbsp;should see latencies less than a millisecond, US-EU latencies will be around 150ms, depending on where in the US and EU your machines are located. Bandwidth between your locations is likely also much lower than between the racks.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Impala currently does not do any rack-aware scheduling of I/O and data exchanges. In addition it is not optimized for high variance in link latencies and throughput. HDFS itself to my knowledge also makes no optimizations for such a case.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Frankly, I don't think you will see good performance in such a scenario. If you want to increase data availability, you could explore replicating the data between your locations while running queries in only one at a time. If you want to increase service availability, you can look into using a load balancer and switching from one cluster to the other in case of failure.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Jul 2018 15:52:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/When-I-add-a-new-rack-some-Impala-queries-became-extremely/m-p/69866#M80530</guid>
      <dc:creator>Lars Volker</dc:creator>
      <dc:date>2018-07-11T15:52:16Z</dc:date>
    </item>
    <item>
      <title>Re: When I add a new rack some Impala queries became extremely slow!</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/When-I-add-a-new-rack-some-Impala-queries-became-extremely/m-p/69883#M80531</link>
      <description>&lt;DIV&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/13477"&gt;@Lars Volker&lt;/a&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/DIV&gt;&lt;DIV&gt;Thanks for your reply,&lt;BR /&gt;&lt;BR /&gt;Did you mean that I have to create &lt;STRONG&gt;two clusters&lt;/STRONG&gt; and synchronise the data between them? if yes wht is the best tool to do this? &lt;U&gt;Peers&lt;/U&gt;, &lt;U&gt;DistCp&lt;/U&gt; HDFS command or another technic ?&lt;BR /&gt;&lt;BR /&gt;&lt;/DIV&gt;&lt;DIV&gt;What if I do the second rack in different data centers but in &lt;STRONG&gt;same country&lt;/STRONG&gt; (US for example), is latency will be reasonable or no!? or the unique solution is to have the two racks in the same data center!!&lt;/DIV&gt;&lt;DIV&gt;What about &lt;STRONG&gt;Kudu&lt;/STRONG&gt;? is it have any rack awareness ?&lt;BR /&gt;&lt;BR /&gt;&lt;/DIV&gt;&lt;DIV&gt;Is there any kind of &lt;U&gt;cloudera documentation&lt;/U&gt; about the architechture of multi datacenters/racks.. ?&lt;BR /&gt;&lt;BR /&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;EM&gt;Remark&lt;/EM&gt;: In fact, I dont now why there is a HA config in all services if there &lt;U&gt;&lt;EM&gt;is no rack awareness&lt;/EM&gt;&lt;/U&gt;.&lt;/DIV&gt;&lt;DIV&gt;&lt;BR /&gt;Thanks again.&lt;/DIV&gt;</description>
      <pubDate>Wed, 11 Jul 2018 18:14:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/When-I-add-a-new-rack-some-Impala-queries-became-extremely/m-p/69883#M80531</guid>
      <dc:creator>AcharkiMed</dc:creator>
      <dc:date>2018-07-11T18:14:01Z</dc:date>
    </item>
    <item>
      <title>Re: When I add a new rack some Impala queries became extremely slow!</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/When-I-add-a-new-rack-some-Impala-queries-became-extremely/m-p/69896#M80532</link>
      <description>&lt;P&gt;Yes, creating two clusters is what you could try. I'm no expert in setting this up and unfortunately I also don't have good advice on&amp;nbsp;which tooling to use. distcp certainly&amp;nbsp;could be worth a try.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Within a country&amp;nbsp;your experience will depend on where your machines are, and you'll likely also be affected by reduced bandwidth between data centers.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm not sure about other services' behavior when running across racks. Impala is not (yet) rack-aware in its scheduling and exchanges. However, even once we get to adding support for rack-awareness, we might assume that the racks are within a single data-center.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Jul 2018 21:42:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/When-I-add-a-new-rack-some-Impala-queries-became-extremely/m-p/69896#M80532</guid>
      <dc:creator>Lars Volker</dc:creator>
      <dc:date>2018-07-11T21:42:37Z</dc:date>
    </item>
    <item>
      <title>Re: When I add a new rack some Impala queries became extremely slow!</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/When-I-add-a-new-rack-some-Impala-queries-became-extremely/m-p/71510#M80533</link>
      <description>&lt;P&gt;Syncronizing data between clusters can be accomplished via distcp, &lt;A href="https://www.cloudera.com/content/dam/www/marketing/resources/training/cloudera-enterprise-bdr-overview.png.landing.html" target="_self"&gt;BDR&lt;/A&gt;, or ingesting data into both clusters simulatenously using 3rd party tools. The best tool depends on your use case, risk tolerance, and budget.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We don't recommend spanning clusters across large geographic regions (e.g. US to EU); network latency and bandwidth are usually not suitable and could easily result in the&amp;nbsp;slow query times you're experiencing.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We DO support spanning clusters across AWS Availability Zones if certain conditions are met; see Appendix A of &lt;A href="http://tiny.cloudera.com/aws-ra" target="_self"&gt;Cloudera Enterprise Reference Architecture&amp;nbsp;for AWS Deployments&lt;/A&gt;&amp;nbsp;(PDF) details. For comparison, the latency between AWS AZs is typically sub-millisecond.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Spanning bare metal clusters across multiple data centers will be addressed in the next release of &lt;A href="http://tiny.cloudera.com/metal-ra" target="_self"&gt;Cloudera Enterprise Reference Architecture for Bare Metal Deployments&lt;/A&gt;&amp;nbsp;(PDF), to coincide with C6. It will look similar to the AWS guidance, but with the additional caveat that network latency between sides should not exceed 10ms.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://kudu.apache.org/docs/known_issues.html" target="_self"&gt;Kudu does not support rack awareness&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Not all services provide HA.&lt;/P&gt;</description>
      <pubDate>Thu, 12 Jul 2018 13:16:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/When-I-add-a-new-rack-some-Impala-queries-became-extremely/m-p/71510#M80533</guid>
      <dc:creator>alexm</dc:creator>
      <dc:date>2018-07-12T13:16:53Z</dc:date>
    </item>
  </channel>
</rss>

