Support Questions

Find answers, ask questions, and share your expertise

CDS 2.3 release 2 Lineage File Missing Error

avatar
Explorer

I tried to upgrade Spark from 2.2 to 2.3 and got an error. It has something to do with the lineage file missing. So, the SparkContext could not be initialized. I rolled back to CDS 2.2 release 2. Does anyone have a way to fix this?

 

Thanks.

10 REPLIES 10

avatar
Master Collaborator

Just wanted to complete the thread here. This is now documented in the known issues section of the Spark2.3 documentation followed by workarounds to mitigate the error. Thx.

 

https://www.cloudera.com/documentation/spark2/latest/topics/spark2_known_issues.html#concept_kgn_j3g...

 

 

In CDS 2.3 release 2, Spark jobs fail when lineage is enabled because
Cloudera Manager does not automatically create the associated lineage
log directory (/var/log/spark2/lineage) on all required cluster hosts.

Note that this feature is enabled by default in CDS 2.3 release 2. Implement one of the following workarounds to continue running Spark jobs. Workaround 1 - Deploy the Spark gateway role on all hosts that are running the YARN NodeManager role Cloudera Manager only creates the lineage log directory on hosts with Spark 2 roles deployed on them.
However, this is not sufficient because the Spark driver can run on any host that is running a YARN NodeManager.
To ensure Cloudera Manager creates the log directory, add the Spark 2 gateway role to every cluster host that is running the YARN NodeManager role. For instructions on how to add a role to a host, see the Cloudera Manager documentation: Adding a Role Instance Workaround 2 - Disable Spark Lineage Collection To disable the feature, log in to Cloudera Manager and go to the Spark 2 service.
Click Configuration.
Search for the Enable Lineage Collection property and uncheck the checkbox to disable lineage collection.
Click Save Changes.