Support Questions

Benassi10 · ‎04-19-2018

I tried to upgrade Spark from 2.2 to 2.3 and got an error. It has something to do with the lineage file missing. So, the SparkContext could not be initialized. I rolled back to CDS 2.2 release 2. Does anyone have a way to fix this?

Thanks.

AutoIN · ‎06-11-2018

Just wanted to complete the thread here. This is now documented in the known issues section of the Spark2.3 documentation followed by workarounds to mitigate the error. Thx.

https://www.cloudera.com/documentation/spark2/latest/topics/spark2_known_issues.html#concept_kgn_j3g...

In CDS 2.3 release 2, Spark jobs fail when lineage is enabled because
Cloudera Manager does not automatically create the associated lineage
log directory (/var/log/spark2/lineage) on all required cluster hosts.

Note that this feature is enabled by default in CDS 2.3 release 2.

Implement one of the following workarounds to continue running Spark jobs.

Workaround 1 - Deploy the Spark gateway role on all hosts that are running the YARN NodeManager role

Cloudera Manager only creates the lineage log directory on hosts with Spark 2 roles deployed on them.
However, this is not sufficient because the Spark driver can run on any host that is running a YARN NodeManager.
To ensure Cloudera Manager creates the log directory, add the Spark 2 gateway role to every cluster host that is running the YARN NodeManager role.

For instructions on how to add a role to a host, see the Cloudera Manager documentation: Adding a Role Instance

Workaround 2 - Disable Spark Lineage Collection

To disable the feature, log in to Cloudera Manager and go to the Spark 2 service. 
Click Configuration.
Search for the Enable Lineage Collection property and uncheck the checkbox to disable lineage collection. 
Click Save Changes.

Cloudera Community

Support Questions

CDS 2.3 release 2 Lineage File Missing Error