09-14-2018 05:25 AM - last edited on 09-14-2018 06:04 AM by cjervis
I am inserting data to hive table with iterations in spark.
For ex. lets say 10 000 items, firstly these items are separated to 5 list, each list has 2000 items. After that I am doing iteration on that 5 lists.
In each iteration, 2000 items maps to much more rows so at the end of iteration 15M records are inserted to hive table. Each iteration is completed in 40 mins.
My issue is after each iteration. spark is waiting for starting the other 2000 K items. The waiting time is aboout 90 mins ! In that time gap, there is no active tasks in spark web ui.
By the way, iterations are directly start with spark process. no any scala or java code is exist at the begging or at the end of iterations.