<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Hive CLI and Beeline jdbc:hive2 behave differently in execution engine tez for insert million records? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-CLI-and-Beeline-jdbc-hive2-behave-differently-in/m-p/215748#M81702</link>
    <description>&lt;P&gt;Dear &lt;A rel="user" href="https://community.cloudera.com/users/79158/vmurakami.html" nodeid="79158" target="_blank"&gt;@Vinicius Higa Murakami&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;Sorry for late response.&lt;/P&gt;&lt;P&gt;I just got the properties differences in both hivecli and beeline from client machine.&lt;/P&gt;&lt;P&gt;The differences are the hive.exec.scratchdir and hive.exec.stagingdir.&lt;/P&gt;&lt;P&gt;I have uploaded the snapshot.&lt;/P&gt;&lt;P&gt;I have tried this method to get the hive-site.xml for hive CLI but no output result for grep.&lt;/P&gt;&lt;PRE&gt;&amp;lt;code&amp;gt;hive --hiveconf hive.root.logger=DEBUG,console -e '' 2&amp;gt;&amp;amp;1 | grep hive-site.xml&lt;/PRE&gt;
&lt;P&gt;Please suggest how to make the config hive-site.xml same for both executions.&lt;/P&gt;&lt;P&gt;Thanks and regards,&lt;/P&gt;&lt;P&gt;Manjil&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="85685-hivecli-beeline-prop-diff.jpeg" style="width: 1287px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/16288i5295650E9A3F4C9B/image-size/medium?v=v2&amp;amp;px=400" role="button" title="85685-hivecli-beeline-prop-diff.jpeg" alt="85685-hivecli-beeline-prop-diff.jpeg" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
    <pubDate>Sun, 18 Aug 2019 03:15:19 GMT</pubDate>
    <dc:creator>manjilhk</dc:creator>
    <dc:date>2019-08-18T03:15:19Z</dc:date>
    <item>
      <title>Hive CLI and Beeline jdbc:hive2 behave differently in execution engine tez for insert million records?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-CLI-and-Beeline-jdbc-hive2-behave-differently-in/m-p/215744#M81698</link>
      <description>&lt;P&gt;When executing insert into empty table from large table with millions records( 20GB size). The execution is different in hive CLI and beeline.&lt;/P&gt;&lt;P&gt;Hive CLI: It creates two TEZ jobs in Yarn, maybe mapper and reducer and completes in approx 413sec.&lt;/P&gt;&lt;P&gt;Beeline: it creates first TEZ job in Yarn and other are MAPREDUCE jobs which is more than 150 jobs and it takes almost 2 hours.&lt;/P&gt;&lt;P&gt;is it the expected behavior of hiveserver2 beeline for TEZ job, since internally it creates MAPREDUCE job?&lt;/P&gt;&lt;P&gt;Environment details:&lt;/P&gt;&lt;P&gt;Hive version: 2.1.1&lt;/P&gt;&lt;P&gt;Tez version: 0.8.5&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/86389-hive-cli.txt"&gt;hive-cli.txt&lt;/A&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/86390-beeline-jdbc-hs2.txt"&gt;beeline-jdbc-hs2.txt&lt;/A&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/86391-beeline-jdbc-hs2.txt"&gt;beeline-jdbc-hs2.txt&lt;/A&gt;&lt;/P&gt;&lt;P&gt;hive common settings:&lt;/P&gt;&lt;P&gt;hive.execution.engine=tez&lt;/P&gt;&lt;P&gt;hive.mv.files.thread=0&lt;/P&gt;&lt;P&gt;beeline setting:&lt;/P&gt;&lt;P&gt;tez.am.resource.memory.mb=20000&lt;/P&gt;&lt;P&gt;mapreduce.map.memory.mb=20000&lt;/P&gt;&lt;P&gt;hive.vectorized.execution.reduce.enabled=false;&lt;/P&gt;&lt;P&gt;Hive CLI log and Beeline logs uploaded.&lt;/P&gt;&lt;P&gt;Thanks in advance.&lt;/P&gt;</description>
      <pubDate>Tue, 07 Aug 2018 16:32:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-CLI-and-Beeline-jdbc-hive2-behave-differently-in/m-p/215744#M81698</guid>
      <dc:creator>manjilhk</dc:creator>
      <dc:date>2018-08-07T16:32:14Z</dc:date>
    </item>
    <item>
      <title>Re: Hive CLI and Beeline jdbc:hive2 behave differently in execution engine tez for insert million records?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-CLI-and-Beeline-jdbc-hive2-behave-differently-in/m-p/215745#M81699</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/87340/manjilsubba.html" nodeid="87340"&gt;@manjil subba&lt;/A&gt;! &lt;BR /&gt;Just asking, but, did you apply the same parameter for both (hiveCLI/beeline)?&lt;/P&gt;&lt;PRE&gt;tez.am.resource.memory.mb=20000
mapreduce.map.memory.mb=20000
hive.vectorized.execution.reduce.enabled=false;
&lt;/PRE&gt;&lt;P&gt;And answering your question, for the job afaik, they should have the same behaviour. &lt;BR /&gt;The only thing would be that beeline access the HS2/Thrift and hivecli does not. &lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Hope this helps! &lt;/P&gt;</description>
      <pubDate>Fri, 10 Aug 2018 06:28:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-CLI-and-Beeline-jdbc-hive2-behave-differently-in/m-p/215745#M81699</guid>
      <dc:creator>vmurakami</dc:creator>
      <dc:date>2018-08-10T06:28:05Z</dc:date>
    </item>
    <item>
      <title>Re: Hive CLI and Beeline jdbc:hive2 behave differently in execution engine tez for insert million records?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-CLI-and-Beeline-jdbc-hive2-behave-differently-in/m-p/215746#M81700</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/79158/vmurakami.html" nodeid="79158"&gt;@Vinicius Higa Murakami&lt;/A&gt; ,&lt;/P&gt;&lt;P&gt;Thanks for the response.&lt;/P&gt;&lt;P&gt;The parameter mentioned is only used for Beeline, Since the job was failing in Mapreduce copy job with error virtual memory used was 18G and allocated was 16.2G in yarn.&lt;/P&gt;&lt;P&gt;Just to explain more on the difference observed in log of hive CLI and beeline, the hdfs temp directory used are different.&lt;/P&gt;&lt;P&gt;Is there any configuration we need to modify to make same?&lt;/P&gt;&lt;P&gt;Hive CLI:  hdfs://edhcluster/user/hive/staging_hive_2018-08-07_18-22-53_167_2618699013418541798-1/-ext-10001&lt;/P&gt;&lt;P&gt;Beeline : hdfs://edhcluster/tmp/hive/staging_hive_2018-08-07_16 -29-12_750_8973639287951385407-1/-ext-10001&lt;/P&gt;&lt;P&gt;Hive Cli log:&lt;/P&gt;&lt;P&gt;2018-08-07T18:22:56,601 INFO [main] exec.Utilities: Setting plan: /tmp/hive/scratch/hive/a501276d-2015-435b-85c5-4d40534ac162/hive_2018-08-07_18-22-53_167_2618699013418541798-1/hive/_tez_scratch_dir/d5cc1718-38b1-49ba-a97e-ab9f78415b62/map.xml &lt;/P&gt;&lt;P&gt;2018-08-07T18:22:56,669 INFO [main] fs.FSStatsPublisher: created : hdfs://edhcluster/user/hive/staging_hive_2018-08-07_18-22-53_167_2618699013418541798-1/-ext-10001 &lt;/P&gt;&lt;P&gt;2018-08-07T18:22:56,686 INFO [main] client.TezClient: Submitting dag to TezSession, sessionName=HIVE-a501276d-2015-435b-85c5-4d40534ac162, applicationId=application_1533623337748_0376, dagName=insert into default.t...db.temp_large_table3(Stage-1), callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, &lt;/P&gt;&lt;P&gt;Beeline log:&lt;/P&gt;&lt;P&gt;2018-08-07T16:29:13,903  INFO [HiveServer2-Background-Pool: Thread-1549] exec.Utilities: Setting plan: /tmp/hive/scratch/hive/0887b266-675a-4fb2-8c85-3a27ebb
3b9fc/hive_2018-08-07_16-29-12_750_8973639287951385407-3/hive/_tez_scratch_dir/6f4620d8-310c-4aff-bbe8-6f69ea9d1341/map.xml &lt;/P&gt;&lt;P&gt;2018-08-07T16:29:13,934  INFO [HiveServer2-Background-Pool: Thread-1549] fs.FSStatsPublisher: created : hdfs://edhcluster/tmp/hive/staging_hive_2018-08-07_16
-29-12_750_8973639287951385407-1/-ext-10001 &lt;/P&gt;&lt;P&gt;2018-08-07T16:29:13,938  INFO [HiveServer2-Background-Pool: Thread-1549] client.TezClient: Submitting dag to TezSession, sessionName=HIVE-e2dfe4df-37f0-4d95-
946d-30557075f807, applicationId=application_1533623337748_0148, dagName=insert into default.t...db.temp_large_table3(Stage-1), callerContext={ context=HIVE,
 callerType=HIVE_QUERY_ID, callerId=hive_20180807162912_519c1503-c151-4da7-b5a2-bd067e9c42b9 }&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;&lt;P&gt;Manjil&lt;/P&gt;</description>
      <pubDate>Fri, 10 Aug 2018 08:37:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-CLI-and-Beeline-jdbc-hive2-behave-differently-in/m-p/215746#M81700</guid>
      <dc:creator>manjilhk</dc:creator>
      <dc:date>2018-08-10T08:37:47Z</dc:date>
    </item>
    <item>
      <title>Re: Hive CLI and Beeline jdbc:hive2 behave differently in execution engine tez for insert million records?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-CLI-and-Beeline-jdbc-hive2-behave-differently-in/m-p/215747#M81701</link>
      <description>&lt;P&gt;Hello &lt;A rel="user" href="https://community.cloudera.com/users/87340/manjilsubba.html" nodeid="87340"&gt;@manjil subba&lt;/A&gt;! &lt;BR /&gt;Sorry for the long delay. &lt;BR /&gt;I've asked for those parameters, cause they seem a little bit higher than usual (that's probably the reason, why beeline is taking much longer than hiveCli). And also the vectorization set to false should impact the reduce process as well. &lt;/P&gt;&lt;P&gt;You can adjust the TEZ performance by looking at this link below:&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/articles/14309/demystify-tez-tuning-step-by-step.html" target="_blank"&gt;https://community.hortonworks.com/articles/14309/demystify-tez-tuning-step-by-step.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;And what you can do to check are both settings (hivecli/beeline) is:&lt;/P&gt;&lt;PRE&gt;beeline -u 'jdbc:hive2://&amp;lt;HS2&amp;gt;:10000/default' -e "set;" &amp;gt; /tmp/beeline.properties&lt;BR /&gt;hive -e "set;" &amp;gt; /tmp/hivecli.properties
diff /tmp/beeline.properties /tmp/hivecli.properties&lt;/PRE&gt;&lt;P&gt;BTW, in your beeline logs I didn't note the following message:&lt;/P&gt;&lt;PRE&gt;Closing Tez Session&lt;/PRE&gt;&lt;P&gt;Maybe we can ensure that all parameters are equal (beeline x hivecli) and after that, enable the debug level for beeline, to check what's going on. &lt;/P&gt;&lt;P&gt;Hope this helps! &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; &lt;/P&gt;</description>
      <pubDate>Mon, 13 Aug 2018 21:08:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-CLI-and-Beeline-jdbc-hive2-behave-differently-in/m-p/215747#M81701</guid>
      <dc:creator>vmurakami</dc:creator>
      <dc:date>2018-08-13T21:08:34Z</dc:date>
    </item>
    <item>
      <title>Re: Hive CLI and Beeline jdbc:hive2 behave differently in execution engine tez for insert million records?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-CLI-and-Beeline-jdbc-hive2-behave-differently-in/m-p/215748#M81702</link>
      <description>&lt;P&gt;Dear &lt;A rel="user" href="https://community.cloudera.com/users/79158/vmurakami.html" nodeid="79158" target="_blank"&gt;@Vinicius Higa Murakami&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;Sorry for late response.&lt;/P&gt;&lt;P&gt;I just got the properties differences in both hivecli and beeline from client machine.&lt;/P&gt;&lt;P&gt;The differences are the hive.exec.scratchdir and hive.exec.stagingdir.&lt;/P&gt;&lt;P&gt;I have uploaded the snapshot.&lt;/P&gt;&lt;P&gt;I have tried this method to get the hive-site.xml for hive CLI but no output result for grep.&lt;/P&gt;&lt;PRE&gt;&amp;lt;code&amp;gt;hive --hiveconf hive.root.logger=DEBUG,console -e '' 2&amp;gt;&amp;amp;1 | grep hive-site.xml&lt;/PRE&gt;
&lt;P&gt;Please suggest how to make the config hive-site.xml same for both executions.&lt;/P&gt;&lt;P&gt;Thanks and regards,&lt;/P&gt;&lt;P&gt;Manjil&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="85685-hivecli-beeline-prop-diff.jpeg" style="width: 1287px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/16288i5295650E9A3F4C9B/image-size/medium?v=v2&amp;amp;px=400" role="button" title="85685-hivecli-beeline-prop-diff.jpeg" alt="85685-hivecli-beeline-prop-diff.jpeg" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 18 Aug 2019 03:15:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-CLI-and-Beeline-jdbc-hive2-behave-differently-in/m-p/215748#M81702</guid>
      <dc:creator>manjilhk</dc:creator>
      <dc:date>2019-08-18T03:15:19Z</dc:date>
    </item>
    <item>
      <title>Re: Hive CLI and Beeline jdbc:hive2 behave differently in execution engine tez for insert million records?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-CLI-and-Beeline-jdbc-hive2-behave-differently-in/m-p/215749#M81703</link>
      <description>&lt;P&gt;Update:&lt;/P&gt;&lt;P&gt;It is found that .hiverc was used in hive user for HIVE CLI so the difference was found.&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;hive.exec.scratchdir=/user/hive/scratch&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;hive.exec.stagingdir=/user/hive/staging&lt;/P&gt;&lt;P&gt;The issue is hdfs /user/hive directory is encrypted with Ranger and hdfs /tmp/hive directory is non-encrypted and can read/write by all user in hadoop group.&lt;/P&gt;&lt;P&gt;hive-site.xml&lt;/P&gt;&lt;P&gt;     &amp;lt;property&amp;gt;&lt;/P&gt;&lt;P&gt;       &amp;lt;name&amp;gt;hive.security.authorization.sqlstd.confwhitelist.append&amp;lt;/name&amp;gt;&lt;/P&gt;&lt;P&gt;      &amp;lt;value&amp;gt;&lt;STRONG&gt;hive\.exec\.scratchdir|hive\.exec\.stagingdir&lt;/STRONG&gt;&amp;lt;/value&amp;gt;&lt;/P&gt;&lt;P&gt;      &amp;lt;description&amp;gt;append conf property in white list followed by pipeline&amp;lt;/description&amp;gt;&lt;/P&gt;&lt;P&gt;    &amp;lt;/property&amp;gt;&lt;/P&gt;&lt;P&gt;Restart the metastore and hiveserver.&lt;/P&gt;&lt;P&gt;I tested with beeline with session level change . The execution is fast like HIVE CLI .&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;hive.exec.scratchdir=/user/hive/scratch&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;hive.exec.stagingdir=/user/hive/staging&lt;/P&gt;&lt;P&gt;I tested with HIVE CLI with session level change. The execution is slow with MAP reduce Job for moving data.&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;hive.exec.scratchdir=/tmp/hive/scratch&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;hive.exec.stagingdir=/tmp/hive/staging&lt;/P&gt;&lt;P&gt;So the root cause is data are encrypted in /user/hive and not encrypted in /tmp/hive.&lt;/P&gt;&lt;P&gt;Solution is to make ssession level change to use same encryption zone.&lt;/P&gt;&lt;P&gt;So below INFO log will be printed if the encryption zones are different.&lt;/P&gt;&lt;P&gt;metadata.Hive: Copying source hdfs://edhcluster/tmp/hive/staging_hive_2018-08-07_16- 29-12_750_8973639287951385407-1/-ext-10000/000001_0 to hdfs://edhcluster/user/hive/warehouse/temp_tro/000001_0 because HDFS encryption zones are different.&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Manjil&lt;/P&gt;</description>
      <pubDate>Tue, 21 Aug 2018 17:08:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-CLI-and-Beeline-jdbc-hive2-behave-differently-in/m-p/215749#M81703</guid>
      <dc:creator>manjilhk</dc:creator>
      <dc:date>2018-08-21T17:08:42Z</dc:date>
    </item>
  </channel>
</rss>

