<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: On processing Large volumes tables  MR is performing better than TEZ, But All forums says its TEZ that always better than MR. Please suggest. in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/On-processing-Large-volumes-tables-MR-is-performing-better/m-p/159923#M57220</link>
    <description>&lt;P&gt;1)Please
define actual size and performance numbers that you encountered.&lt;/P&gt;&lt;P&gt;  Ans. &lt;/P&gt;&lt;TABLE&gt;
 &lt;TBODY&gt;&lt;TR&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Data
  Volume&lt;/STRONG&gt;&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Time
  elapsed for TEZ&lt;/STRONG&gt;&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Average
  Time MR&lt;/STRONG&gt;&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Time
  elapsed for MR&lt;/STRONG&gt;&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Average
  Time for TEZ&lt;/STRONG&gt;&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;1900 records&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;46.350 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;41.626 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;63.666 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;56.176 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;40.341 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;55.633 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;38.189 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;49.230 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;91914 records&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;32.049 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;32.097 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;52.920 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;51.236 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;32.088 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;49.030 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;32.156 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;51.760 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;993168 records&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;850.01 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;861.781 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;611.625 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;635.781 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;865.230 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;691.751 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;872.110 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;672.285 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;868.995 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;567.466 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
 &lt;/TR&gt;
&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;2)Clarify what test beds you are referring and how did you use
them?&lt;/P&gt;&lt;P&gt;Ans. In above statistics table:&lt;/P&gt;&lt;P&gt;In Operation 1 is a creating lateral view on a small data set.&lt;/P&gt;&lt;P&gt;In Operation 2 is joining 3 tables of intermediate data volume.&lt;/P&gt;&lt;P&gt;In Operation 3 is joining 4 tables of large data volume in inner
query and aggregation happening on top of that. &lt;/P&gt;&lt;P&gt;3)Clarify
what is the type of test case you execute? It is important to clarify because
some tests can be disk I/O intensive, others can be memory intensive.&lt;/P&gt;&lt;UL&gt;
&lt;LI&gt;1.Ans. Above jobs ran in parallel i.e. 10 jobs in parallel
on TEZ mode and 10 jobs in parallel on MR mode. &lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;
&lt;LI&gt;2.Above results are output of multiple test
iterations and performed on different test beds.&lt;/LI&gt;&lt;/UL&gt;</description>
    <pubDate>Thu, 30 Mar 2017 12:23:36 GMT</pubDate>
    <dc:creator>vipulksaath</dc:creator>
    <dc:date>2017-03-30T12:23:36Z</dc:date>
    <item>
      <title>On processing Large volumes tables  MR is performing better than TEZ, But All forums says its TEZ that always better than MR. Please suggest.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/On-processing-Large-volumes-tables-MR-is-performing-better/m-p/159920#M57217</link>
      <description>&lt;P&gt;We are doing some analysis on MR vs TEZ. TEZ is doing better than MR on small and mild data volumes but MR is beating TEZ on large volumes, We have seen it multiple times on different test beds. Please suggest&lt;/P&gt;</description>
      <pubDate>Thu, 16 Mar 2017 15:45:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/On-processing-Large-volumes-tables-MR-is-performing-better/m-p/159920#M57217</guid>
      <dc:creator>vipulksaath</dc:creator>
      <dc:date>2017-03-16T15:45:40Z</dc:date>
    </item>
    <item>
      <title>Re: On processing Large volumes tables  MR is performing better than TEZ, But All forums says its TEZ that always better than MR. Please suggest.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/On-processing-Large-volumes-tables-MR-is-performing-better/m-p/159921#M57218</link>
      <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/16658/vipulksaath.html"&gt;Vipul Choudhary&lt;/A&gt;&lt;/P&gt;&lt;P&gt;1) Please define actual size and performance numbers that you encountered. &lt;/P&gt;&lt;P&gt;2) Clarify what test beds you are referring and how did you use them?&lt;/P&gt;&lt;P&gt;3) Clarify what is the type of test case you execute? It is important to clarify because some tests can be disk I/O intensive, others can be memory intensive.&lt;/P&gt;&lt;P&gt;After clarifying all the above, we can state that driving a bike is sometimes faster than driving a Ferrari. That may be because the bike is better suited for niche cases where there is a little space for a car to go through (narrow roads, etc). I would not generalize that easy. I am not sure about anything stated as "is always better". There is always an exception. Anyhow, you can set the desired engine the session level, if you wish to use MR or Tez. Thus, for cases where MR performs better, use it. It is not like you have to code it when you execute a Hive query.&lt;/P&gt;</description>
      <pubDate>Fri, 17 Mar 2017 08:54:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/On-processing-Large-volumes-tables-MR-is-performing-better/m-p/159921#M57218</guid>
      <dc:creator>cstanca</dc:creator>
      <dc:date>2017-03-17T08:54:13Z</dc:date>
    </item>
    <item>
      <title>Re: On processing Large volumes tables  MR is performing better than TEZ, But All forums says its TEZ that always better than MR. Please suggest.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/On-processing-Large-volumes-tables-MR-is-performing-better/m-p/159922#M57219</link>
      <description>&lt;P&gt;Great analogy, but I only have a bike! &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; I'd like to be able to say "set my.transport.engine=ferrari;" and it here it is, at my front door!&lt;/P&gt;</description>
      <pubDate>Fri, 17 Mar 2017 09:12:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/On-processing-Large-volumes-tables-MR-is-performing-better/m-p/159922#M57219</guid>
      <dc:creator>pminovic</dc:creator>
      <dc:date>2017-03-17T09:12:02Z</dc:date>
    </item>
    <item>
      <title>Re: On processing Large volumes tables  MR is performing better than TEZ, But All forums says its TEZ that always better than MR. Please suggest.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/On-processing-Large-volumes-tables-MR-is-performing-better/m-p/159923#M57220</link>
      <description>&lt;P&gt;1)Please
define actual size and performance numbers that you encountered.&lt;/P&gt;&lt;P&gt;  Ans. &lt;/P&gt;&lt;TABLE&gt;
 &lt;TBODY&gt;&lt;TR&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Data
  Volume&lt;/STRONG&gt;&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Time
  elapsed for TEZ&lt;/STRONG&gt;&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Average
  Time MR&lt;/STRONG&gt;&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Time
  elapsed for MR&lt;/STRONG&gt;&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Average
  Time for TEZ&lt;/STRONG&gt;&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;1900 records&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;46.350 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;41.626 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;63.666 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;56.176 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;40.341 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;55.633 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;38.189 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;49.230 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;91914 records&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;32.049 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;32.097 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;52.920 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;51.236 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;32.088 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;49.030 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;32.156 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;51.760 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;993168 records&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;850.01 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;861.781 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;611.625 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;635.781 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;865.230 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;691.751 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;872.110 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;672.285 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;868.995 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
  &lt;TD&gt;
  &lt;UL&gt;&lt;LI&gt;567.466 secs&lt;/LI&gt;&lt;/UL&gt;
  &lt;/TD&gt;
 &lt;/TR&gt;
&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;2)Clarify what test beds you are referring and how did you use
them?&lt;/P&gt;&lt;P&gt;Ans. In above statistics table:&lt;/P&gt;&lt;P&gt;In Operation 1 is a creating lateral view on a small data set.&lt;/P&gt;&lt;P&gt;In Operation 2 is joining 3 tables of intermediate data volume.&lt;/P&gt;&lt;P&gt;In Operation 3 is joining 4 tables of large data volume in inner
query and aggregation happening on top of that. &lt;/P&gt;&lt;P&gt;3)Clarify
what is the type of test case you execute? It is important to clarify because
some tests can be disk I/O intensive, others can be memory intensive.&lt;/P&gt;&lt;UL&gt;
&lt;LI&gt;1.Ans. Above jobs ran in parallel i.e. 10 jobs in parallel
on TEZ mode and 10 jobs in parallel on MR mode. &lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;
&lt;LI&gt;2.Above results are output of multiple test
iterations and performed on different test beds.&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Thu, 30 Mar 2017 12:23:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/On-processing-Large-volumes-tables-MR-is-performing-better/m-p/159923#M57220</guid>
      <dc:creator>vipulksaath</dc:creator>
      <dc:date>2017-03-30T12:23:36Z</dc:date>
    </item>
    <item>
      <title>Re: On processing Large volumes tables  MR is performing better than TEZ, But All forums says its TEZ that always better than MR. Please suggest.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/On-processing-Large-volumes-tables-MR-is-performing-better/m-p/159924#M57221</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3486/cstanca.html" nodeid="3486"&gt;@Constantin Stanca&lt;/A&gt; Any thoughts on this?&lt;/P&gt;</description>
      <pubDate>Sat, 01 Apr 2017 23:23:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/On-processing-Large-volumes-tables-MR-is-performing-better/m-p/159924#M57221</guid>
      <dc:creator>vipulksaath</dc:creator>
      <dc:date>2017-04-01T23:23:27Z</dc:date>
    </item>
  </channel>
</rss>

