Support Questions
Find answers, ask questions, and share your expertise

How multiple users run spark concurrently?



User X run a %livy.pyspark job in notebook AnalysisX

5 seconds after user Y run a %livy.pyspark job in notebook AnalysisY

Y have to wait for X's spark job to finish, which is not effective.

How is it possible in HDP2.5 through Livy impersonated, to run multiple spark jobs from Zeppeline at the same time?



I think multi-user should run fine...however suspecting resource allocation issue here. zeppelin only support yarn-client for spark interpreter which means the driver will run on the same host as zeppelin server. And if you run spark interpreter in shared mode, then all the user share the same SparkContext. you should increase executor size and executor core in interpreter setting.


HI @Rishi

The interpreter settings is as followed:

  1. livy.spark.driver.cores = 1
  2. livy.spark.driver.memory = 12G
  3. livy.spark.dynamicAllocation.cachedExecutorIdleTimeout
  4. livy.spark.dynamicAllocation.enabled
  5. livy.spark.dynamicAllocation.initialExecutors
  6. livy.spark.dynamicAllocation.maxExecutors
  7. livy.spark.dynamicAllocation.minExecutors
  8. livy.spark.executor.cores = 4
  9. livy.spark.executor.instances = 11
  10. livy.spark.executor.memory = 12G
  11. livy.spark.master = yarn-cluster
  12. spark.driver.maxResultSize = 120G
  13. zeppelin.interpreter.localRepo = (HIDDEN)
  14. zeppelin.livy.concurrentSQL = false
  15. zeppelin.livy.create.session.retries = 240
  16. zeppelin.livy.keytab = (HIDDEN)
  17. zeppelin.livy.principal = (HIDDEN)
  18. zeppelin.livy.spark.sql.maxResult = 100000
  19. zeppelin.livy.url = http://HOST:8998

It is very clear that, the Y job says PENDING and when X is finish Y starts.