Member since
04-27-2018
2
Posts
0
Kudos Received
0
Solutions
06-11-2018
09:23 PM
1 Kudo
Just wanted to complete the thread here. This is now documented in the known issues section of the Spark2.3 documentation followed by workarounds to mitigate the error. Thx. https://www.cloudera.com/documentation/spark2/latest/topics/spark2_known_issues.html#concept_kgn_j3g_5db In CDS 2.3 release 2, Spark jobs fail when lineage is enabled because Cloudera Manager does not automatically create the associated lineage log directory (/var/log/spark2/lineage) on all required cluster hosts. Note that this feature is enabled by default in CDS 2.3 release 2.
Implement one of the following workarounds to continue running Spark jobs.
Workaround 1 - Deploy the Spark gateway role on all hosts that are running the YARN NodeManager role
Cloudera Manager only creates the lineage log directory on hosts with Spark 2 roles deployed on them. However, this is not sufficient because the Spark driver can run on any host that is running a YARN NodeManager. To ensure Cloudera Manager creates the log directory, add the Spark 2 gateway role to every cluster host that is running the YARN NodeManager role.
For instructions on how to add a role to a host, see the Cloudera Manager documentation: Adding a Role Instance
Workaround 2 - Disable Spark Lineage Collection
To disable the feature, log in to Cloudera Manager and go to the Spark 2 service. Click Configuration. Search for the Enable Lineage Collection property and uncheck the checkbox to disable lineage collection. Click Save Changes.
... View more