Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HWC execution in spark no spark ui

avatar
Explorer

 

We have a full ACID hive managed table that we need to access from spark ETL. We used the documentation provided to connect to Hive WareHouse connector ->

https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/integrating-hive/content/hive_hivewarehousecon...

 

 

In addition to using hive warehouse connector to access the acid tables, what spark execution mode differences are there when using JDBC hwc connector  and HiveWareHouseSession vs SparkContext without hwc connector. We don't see any information spark ui/ spark history server and the query takes far too long (x3) than a similar query from SQLContext using a non-acid managed table.

 

from pyspark_llap import HiveWarehouseSession
hive = HiveWarehouseSession.session(spark).build()

df= hive.sql("select * from incidents LIMIT 100")

...

df.show(10)

#additional spark transformation code..

# NO DAG in spark history server, slower, takes higher memory

__________________________

The same pattern using SQLContext

from pyspark.sql import SQLContext

sqlSparkContext = SQLContext(spark.sparkContext)

df = sqlSparkContext.sql("select * from incidents LIMIT 100")

...

df.show(10)

 #additional spark transformation code..

# SHOWS DAG in spark ui/ spark history server, faster

 

Can someone please explain the difference apart from hive table access where the HiveWarehouseSession spark code gets executed, engines in play, optimization, memory usage etc. vs spark code using SQLContext.

 

Does  "spark.sql.hive.hwc.execution.mode"=spark change the spark map reduce execution?

 

1 REPLY 1

avatar
Expert Contributor

@aval 

Can you please confirm on which version of Cloudera you are currently on ?

 

Basically HWC is required when you want to access Managed tables via Spark.

 

Also use of spark.sql.hive.hwc.execution.mode is deprecated as per CDP 7.1.7

 

https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/integrating-hive-and-bi/topics/hive-hwc-reade...