Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

time gap between two tasks in spark

time gap between two tasks in spark

New Contributor

I am inserting data to hive table with iterations in spark.


For ex. lets say 10 000 items, firstly these items are separated to 5 list, each list has 2000 items. After that I am doing iteration on that 5 lists.


In each iteration, 2000 items maps to much more rows so at the end of iteration 15M records are inserted to hive table. Each iteration is completed in 40 mins.


My issue is after each iteration. spark is waiting for starting the other 2000 K items. The waiting time is aboout 90 mins ! In that time gap, there is no active tasks in spark web ui.


By the way, iterations are directly start with spark process. no any scala or java code is exist at the begging or at the end of iterations.


Any idea?