Member since
02-20-2018
9
Posts
5
Kudos Received
0
Solutions
06-11-2018
09:23 PM
1 Kudo
Just wanted to complete the thread here. This is now documented in the known issues section of the Spark2.3 documentation followed by workarounds to mitigate the error. Thx. https://www.cloudera.com/documentation/spark2/latest/topics/spark2_known_issues.html#concept_kgn_j3g_5db In CDS 2.3 release 2, Spark jobs fail when lineage is enabled because Cloudera Manager does not automatically create the associated lineage log directory (/var/log/spark2/lineage) on all required cluster hosts. Note that this feature is enabled by default in CDS 2.3 release 2.
Implement one of the following workarounds to continue running Spark jobs.
Workaround 1 - Deploy the Spark gateway role on all hosts that are running the YARN NodeManager role
Cloudera Manager only creates the lineage log directory on hosts with Spark 2 roles deployed on them. However, this is not sufficient because the Spark driver can run on any host that is running a YARN NodeManager. To ensure Cloudera Manager creates the log directory, add the Spark 2 gateway role to every cluster host that is running the YARN NodeManager role.
For instructions on how to add a role to a host, see the Cloudera Manager documentation: Adding a Role Instance
Workaround 2 - Disable Spark Lineage Collection
To disable the feature, log in to Cloudera Manager and go to the Spark 2 service. Click Configuration. Search for the Enable Lineage Collection property and uncheck the checkbox to disable lineage collection. Click Save Changes.
... View more
04-19-2018
11:13 AM
I believe we could transition the Hive jobs entirely to Spark. There is only 1 left. We deprecated the rest. So, we are almost clear to go to Kubernetes.
... View more