Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How multiple users run spark concurrently?

How multiple users run spark concurrently?

Contributor

Scenario:

User X run a %livy.pyspark job in notebook AnalysisX

5 seconds after user Y run a %livy.pyspark job in notebook AnalysisY

Y have to wait for X's spark job to finish, which is not effective.

How is it possible in HDP2.5 through Livy impersonated, to run multiple spark jobs from Zeppeline at the same time?

2 REPLIES 2
Highlighted

Re: How multiple users run spark concurrently?

Contributor

I think multi-user should run fine...however suspecting resource allocation issue here. zeppelin only support yarn-client for spark interpreter which means the driver will run on the same host as zeppelin server. And if you run spark interpreter in shared mode, then all the user share the same SparkContext. you should increase executor size and executor core in interpreter setting.

Re: How multiple users run spark concurrently?

Contributor

HI @Rishi

The interpreter settings is as followed:

  1. livy.spark.driver.cores = 1
  2. livy.spark.driver.memory = 12G
  3. livy.spark.dynamicAllocation.cachedExecutorIdleTimeout
  4. livy.spark.dynamicAllocation.enabled
  5. livy.spark.dynamicAllocation.initialExecutors
  6. livy.spark.dynamicAllocation.maxExecutors
  7. livy.spark.dynamicAllocation.minExecutors
  8. livy.spark.executor.cores = 4
  9. livy.spark.executor.instances = 11
  10. livy.spark.executor.memory = 12G
  11. livy.spark.master = yarn-cluster
  12. spark.driver.maxResultSize = 120G
  13. zeppelin.interpreter.localRepo = (HIDDEN)
  14. zeppelin.livy.concurrentSQL = false
  15. zeppelin.livy.create.session.retries = 240
  16. zeppelin.livy.keytab = (HIDDEN)
  17. zeppelin.livy.principal = (HIDDEN)
  18. zeppelin.livy.spark.sql.maxResult = 100000
  19. zeppelin.livy.url = http://HOST:8998

It is very clear that, the Y job says PENDING and when X is finish Y starts.