<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question HiveWarehouseSession vs SQLContext spark execution in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/HiveWarehouseSession-vs-SQLContext-spark-execution/m-p/360200#M238327</link>
    <description>&lt;P&gt;Can someone explain what is different about these two spark execution engines.below?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Environment: CDP private cluster&lt;/P&gt;&lt;P&gt;Spark version 2&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We have a full ACID hive managed table that we need to access from spark ETL. We used the documentation provided to connect to Hive WareHouse connector -&amp;gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/integrating-hive/content/hive_hivewarehouseconnector_for_handling_apache_spark_data.html" target="_blank" rel="noopener"&gt;https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/integrating-hive/content/hive_hivewarehouseconnector_for_handling_apache_spark_data.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In addition to using hive warehouse connector to access the acid tables, what execution differences are there between two submissions. We don't see any DAG in spark history server and the query takes far too long (x3) than a similar query from&amp;nbsp;SQLContext using a non-acid managed table.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;from pyspark_llap import HiveWarehouseSession&lt;BR /&gt;hive = HiveWarehouseSession.session(spark).build()&lt;/P&gt;&lt;P&gt;df= hive.sql("select * from incidents LIMIT 100")&lt;/P&gt;&lt;P&gt;...&lt;/P&gt;&lt;P&gt;df.show(10)&lt;/P&gt;&lt;P&gt;#additional spark transformation code..&lt;/P&gt;&lt;P&gt;# NO DAG in spark history server, slower, takes higher memory&lt;/P&gt;&lt;P&gt;__________________________&lt;/P&gt;&lt;P&gt;The same pattern using SQLContext&lt;/P&gt;&lt;P&gt;from pyspark.sql import SQLContext&lt;/P&gt;&lt;P&gt;sqlSparkContext = SQLContext(spark.sparkContext)&lt;/P&gt;&lt;P&gt;df = sqlSparkContext.sql("select * from incidents LIMIT 100")&lt;/P&gt;&lt;P&gt;...&lt;/P&gt;&lt;P&gt;df.show(10)&lt;/P&gt;&lt;P&gt;&amp;nbsp;#additional spark transformation code..&lt;/P&gt;&lt;P&gt;# SHOWS DAG in spark history server, faster&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can someone please explain the difference apart from hive table access where the HiveWarehouseSession spark code gets executed, engines in play, optimization, memory usage etc. vs spark code using&amp;nbsp;SQLContext. I suspect&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 28 Dec 2022 12:40:48 GMT</pubDate>
    <dc:creator>aval</dc:creator>
    <dc:date>2022-12-28T12:40:48Z</dc:date>
    <item>
      <title>HiveWarehouseSession vs SQLContext spark execution</title>
      <link>https://community.cloudera.com/t5/Support-Questions/HiveWarehouseSession-vs-SQLContext-spark-execution/m-p/360200#M238327</link>
      <description>&lt;P&gt;Can someone explain what is different about these two spark execution engines.below?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Environment: CDP private cluster&lt;/P&gt;&lt;P&gt;Spark version 2&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We have a full ACID hive managed table that we need to access from spark ETL. We used the documentation provided to connect to Hive WareHouse connector -&amp;gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/integrating-hive/content/hive_hivewarehouseconnector_for_handling_apache_spark_data.html" target="_blank" rel="noopener"&gt;https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/integrating-hive/content/hive_hivewarehouseconnector_for_handling_apache_spark_data.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In addition to using hive warehouse connector to access the acid tables, what execution differences are there between two submissions. We don't see any DAG in spark history server and the query takes far too long (x3) than a similar query from&amp;nbsp;SQLContext using a non-acid managed table.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;from pyspark_llap import HiveWarehouseSession&lt;BR /&gt;hive = HiveWarehouseSession.session(spark).build()&lt;/P&gt;&lt;P&gt;df= hive.sql("select * from incidents LIMIT 100")&lt;/P&gt;&lt;P&gt;...&lt;/P&gt;&lt;P&gt;df.show(10)&lt;/P&gt;&lt;P&gt;#additional spark transformation code..&lt;/P&gt;&lt;P&gt;# NO DAG in spark history server, slower, takes higher memory&lt;/P&gt;&lt;P&gt;__________________________&lt;/P&gt;&lt;P&gt;The same pattern using SQLContext&lt;/P&gt;&lt;P&gt;from pyspark.sql import SQLContext&lt;/P&gt;&lt;P&gt;sqlSparkContext = SQLContext(spark.sparkContext)&lt;/P&gt;&lt;P&gt;df = sqlSparkContext.sql("select * from incidents LIMIT 100")&lt;/P&gt;&lt;P&gt;...&lt;/P&gt;&lt;P&gt;df.show(10)&lt;/P&gt;&lt;P&gt;&amp;nbsp;#additional spark transformation code..&lt;/P&gt;&lt;P&gt;# SHOWS DAG in spark history server, faster&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can someone please explain the difference apart from hive table access where the HiveWarehouseSession spark code gets executed, engines in play, optimization, memory usage etc. vs spark code using&amp;nbsp;SQLContext. I suspect&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 28 Dec 2022 12:40:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/HiveWarehouseSession-vs-SQLContext-spark-execution/m-p/360200#M238327</guid>
      <dc:creator>aval</dc:creator>
      <dc:date>2022-12-28T12:40:48Z</dc:date>
    </item>
    <item>
      <title>Re: HiveWarehouseSession vs SQLContext spark execution</title>
      <link>https://community.cloudera.com/t5/Support-Questions/HiveWarehouseSession-vs-SQLContext-spark-execution/m-p/361928#M238673</link>
      <description>&lt;P&gt;In case of HWC, user query will be processed by HWC API connecting to HS2 server where HS2 will execute query either within HS2 or Tez/LLAP daemons&lt;/P&gt;&lt;P&gt;In case of Spark API, spark's framework is used to execute the query by getting necessary metadata about table from HMS&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please refer to below articles to know more about HWC&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/integrating-hive/content/hive_hivewarehouseconnector_for_handling_apache_spark_data.html" target="_blank"&gt;https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/integrating-hive/content/hive_hivewarehouseconnector_for_handling_apache_spark_data.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.cloudera.com/t5/Community-Articles/Integrating-Apache-Hive-with-Apache-Spark-Hive-Warehouse/ta-p/249035" target="_blank"&gt;https://community.cloudera.com/t5/Community-Articles/Integrating-Apache-Hive-with-Apache-Spark-Hive-Warehouse/ta-p/249035&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 20 Jan 2023 16:52:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/HiveWarehouseSession-vs-SQLContext-spark-execution/m-p/361928#M238673</guid>
      <dc:creator>tarak271</dc:creator>
      <dc:date>2023-01-20T16:52:56Z</dc:date>
    </item>
  </channel>
</rss>

