Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Why is this spark job so slow? (and creates thousands of tasks for a stage that writes a df using jdbc?)

Why is this spark job so slow? (and creates thousands of tasks for a stage that writes a df using jdbc?)

Contributor

Below is a job that is actioned by calling jdbc write to append a df to a postgressql table. The top stage (190) takes 20mins to complete and creates over 1000 tasks:

21438-sparklog.png

The df is unioned from 5 dfs as shown below in the DAG visualisation.

21437-dag-vis.png

How can I use the log information to help identify why this particular action is taking so long?

Thanks.

Don't have an account?
Coming from Hortonworks? Activate your account here