Support Questions
Find answers, ask questions, and share your expertise

Run multiple hive query from pyspark on the same session

Run multiple hive query from pyspark on the same session

Expert Contributor

I am trying to run a Hive query with pyspark. I am using Hortonworks so I need to use the Hive WarehouseConnector.

Running one or even multiple queries is easy and works. My problem is that I want to issue set commands before. For instance to set the dag name in tez ui:

set relevant

or to set up some memory configuration:

set hive.tez.container.size = 8192

For these statements to take effect, they need to run on the same session than the main query and that's my issue.


I tried 2 ways:


The first one was to generate a new hive session for each query, with a properly setup url eg.:


builder = HiveWarehouseSession.session(self.spark)
hive =
hive.execute("select * from whatever")


It works well for the first query, but the same url is reused for the next one (even if I try to manually delete builder and hive), so does not work.


The second way is to set


globally in the spark thrift server. his does seem to work, but I find it a shame to limit the global spark thrift server for the benefit of one application only.


Is there a way to achieve what I am looking for? Maybe there could be a way to pin a query to one executor, so hopefully one session?


Re: Run multiple hive query from pyspark on the same session

Sorry that I am not answering your question directly, but I am wondering why you want to run hive query through pyspark? Why don't you just use SparkSQL?