Support Questions
Find answers, ask questions, and share your expertise

Spark2 taking longer time than spark1

Spark2 taking longer time than spark1

New Contributor

I am upgrading spark version from 1.6 to 2.3. As part of the change I made some config changes.

Meanwhile, the job takes 3-4 times more time for completion as compared spark 1.

When I compared both the versions and identified the specific point where spark2 seems to be taking more time.


val filtersDf: List[DataFrame] = inputData.filter(_.isDefined).map(_.get)
// Convert modified po dataframe to json
val postProcessModifiedPoList = cleanedObjModifiedPoLists.foldLeft(Seq[String]())((modVoJSONs, data) => {"Convert to json: dataframe count {}", data.count().toString)"Convert to json: dataframe columns {}", data.columns.mkString(","))"Convert to json: Size of dataframe {}", SizeEstimator.estimate(data).toString)
  modVoJSONs ++ data.toJSON.collect()


When checked the logs, It seems to be hanging at one of the point and started showing below logs(10-15 min) and start removing the executors, As a result only 3 executors are left for processing final stage. which takes more time. We are using dynamic allocation and maximum executor count is 30 and min is 3.


ContextCleaner: Cleaned accumulator ...
Removing executor as it has been idle for 60 sec


Sample output


Convert to json: dataframe count 4
dataframe columns 7 columns only without huge data
Convert to json: Size of dataframe 508918002