Support Questions
Find answers, ask questions, and share your expertise

Spark2 taking longer time than spark1

Spark2 taking longer time than spark1

New Contributor

I am upgrading spark version from 1.6 to 2.3. As part of the change I made some config changes.

Meanwhile, the job takes 3-4 times more time for completion as compared spark 1.

When I compared both the versions and identified the specific point where spark2 seems to be taking more time.

 

val filtersDf: List[DataFrame] = inputData.filter(_.isDefined).map(_.get)
// Convert modified po dataframe to json
val postProcessModifiedPoList = cleanedObjModifiedPoLists.foldLeft(Seq[String]())((modVoJSONs, data) => {
  logger.info("Convert to json: dataframe count {}", data.count().toString)
  logger.info("Convert to json: dataframe columns {}", data.columns.mkString(","))
  logger.info("Convert to json: Size of dataframe {}", SizeEstimator.estimate(data).toString)
  modVoJSONs ++ data.toJSON.collect()
})

 

When checked the logs, It seems to be hanging at one of the point and started showing below logs(10-15 min) and start removing the executors, As a result only 3 executors are left for processing final stage. which takes more time. We are using dynamic allocation and maximum executor count is 30 and min is 3.

 

ContextCleaner: Cleaned accumulator ...
Removing executor as it has been idle for 60 sec

 

Sample output

 

Convert to json: dataframe count 4
dataframe columns 7 columns only without huge data
Convert to json: Size of dataframe 508918002