Support Questions

alarsen · ‎10-16-2017

Scenario:

User X run a %livy.pyspark job in notebook AnalysisX

5 seconds after user Y run a %livy.pyspark job in notebook AnalysisY

Y have to wait for X's spark job to finish, which is not effective.

How is it possible in HDP2.5 through Livy impersonated, to run multiple spark jobs from Zeppeline at the same time?

rjain1 · ‎10-16-2017

I think multi-user should run fine...however suspecting resource allocation issue here. zeppelin only support yarn-client for spark interpreter which means the driver will run on the same host as zeppelin server. And if you run spark interpreter in shared mode, then all the user share the same SparkContext. you should increase executor size and executor core in interpreter setting.

alarsen · ‎10-17-2017

HI @Rishi

The interpreter settings is as followed:

livy.spark.driver.cores = 1
livy.spark.driver.memory = 12G
livy.spark.dynamicAllocation.cachedExecutorIdleTimeout
livy.spark.dynamicAllocation.enabled
livy.spark.dynamicAllocation.initialExecutors
livy.spark.dynamicAllocation.maxExecutors
livy.spark.dynamicAllocation.minExecutors
livy.spark.executor.cores = 4
livy.spark.executor.instances = 11
livy.spark.executor.memory = 12G
livy.spark.master = yarn-cluster
spark.driver.maxResultSize = 120G
zeppelin.interpreter.localRepo = (HIDDEN)
zeppelin.livy.concurrentSQL = false
zeppelin.livy.create.session.retries = 240
zeppelin.livy.keytab = (HIDDEN)
zeppelin.livy.principal = (HIDDEN)
zeppelin.livy.spark.sql.maxResult = 100000
zeppelin.livy.url = http://HOST:8998

It is very clear that, the Y job says PENDING and when X is finish Y starts.

Cloudera Community

Support Questions

How multiple users run spark concurrently?