Run multiple hive query from pyspark on the same session

Expert Contributor

I am trying to run a Hive query with pyspark. I am using Hortonworks so I need to use the Hive WarehouseConnector.

Running one or even multiple queries is easy and works. My problem is that I want to issue set commands before. For instance to set the dag name in tez ui:

set relevant

or to set up some memory configuration:

set hive.tez.container.size = 8192

For these statements to take effect, they need to run on the same session than the main query and that's my issue.


I tried 2 ways:


The first one was to generate a new hive session for each query, with a properly setup url eg.:


builder = HiveWarehouseSession.session(self.spark)
hive =
hive.execute("select * from whatever")


It works well for the first query, but the same url is reused for the next one (even if I try to manually delete builder and hive), so does not work.


The second way is to set


globally in the spark thrift server. his does seem to work, but I find it a shame to limit the global spark thrift server for the benefit of one application only.


Is there a way to achieve what I am looking for? Maybe there could be a way to pin a query to one executor, so hopefully one session?


Re: Run multiple hive query from pyspark on the same session

Sorry that I am not answering your question directly, but I am wondering why you want to run hive query through pyspark? Why don't you just use SparkSQL?