Support Questions

Find answers, ask questions, and share your expertise

CDS 2.3 release 2 Lineage File Missing Error

Explorer

I tried to upgrade Spark from 2.2 to 2.3 and got an error. It has something to do with the lineage file missing. So, the SparkContext could not be initialized. I rolled back to CDS 2.2 release 2. Does anyone have a way to fix this?

 

Thanks.

10 REPLIES 10

New Contributor

I get the same error as you. Did you solve this?

Explorer
No, I haven't. So far, there has been no answers.

Expert Contributor

Thanks for reporting. Care to share the full error for the lineage file missing, please? I quickly tested an upgrade from 2.2 to 2.3 but didn't hit this. A full error stack trace would certainly help.

Explorer

Here is the full stack when I try to launch spark-shell.

 

18/05/02 02:47:37 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Exception when registering SparkListener
	at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2364)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:553)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
	at org.apache.spark.repl.Main$.createSparkSession(Main.scala:103)
	at $line3.$read$$iw$$iw.<init>(<console>:15)
	at $line3.$read$$iw.<init>(<console>:43)
	at $line3.$read.<init>(<console>:45)
	at $line3.$read$.<init>(<console>:49)
	at $line3.$read$.<clinit>(<console>)
	at $line3.$eval$.$print$lzycompute(<console>:7)
	at $line3.$eval$.$print(<console>:6)
	at $line3.$eval.$print(<console>)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
	at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
	at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
	at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
	at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
	at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
	at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
	at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
	at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
	at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807)
	at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681)
	at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395)
	at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(SparkILoop.scala:79)
	at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(SparkILoop.scala:79)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SparkILoop.scala:79)
	at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1$$anonfun$apply$mcV$sp$1.apply(SparkILoop.scala:79)
	at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1$$anonfun$apply$mcV$sp$1.apply(SparkILoop.scala:79)
	at scala.tools.nsc.interpreter.ILoop.savingReplayStack(ILoop.scala:91)
	at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply$mcV$sp(SparkILoop.scala:78)
	at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:78)
	at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:78)
	at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214)
	at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:77)
	at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:110)
	at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:920)
	at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
	at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
	at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97)
	at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909)
	at org.apache.spark.repl.Main$.doMain(Main.scala:76)
	at org.apache.spark.repl.Main$.main(Main.scala:56)
	at org.apache.spark.repl.Main.main(Main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:892)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.FileNotFoundException: Lineage directory /var/log/spark2/lineage doesn't exist or is not writable.
	at com.cloudera.spark.lineage.LineageWriter$.checkLineageConfig(LineageWriter.scala:158)
	at com.cloudera.spark.lineage.NavigatorAppListener.<init>(ClouderaNavigatorListener.scala:30)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
	at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2740)
	at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2732)
	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
	at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
	at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
	at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2732)
	at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2353)
	at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2352)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2352)
	... 62 more
org.apache.spark.SparkException: Exception when registering SparkListener
  at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2364)
  at org.apache.spark.SparkContext.<init>(SparkContext.scala:553)
  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
  at org.apache.spark.repl.Main$.createSparkSession(Main.scala:103)
  ... 55 elided
Caused by: java.io.FileNotFoundException: Lineage directory /var/log/spark2/lineage doesn't exist or is not writable.
  at com.cloudera.spark.lineage.LineageWriter$.checkLineageConfig(LineageWriter.scala:158)
  at com.cloudera.spark.lineage.NavigatorAppListener.<init>(ClouderaNavigatorListener.scala:30)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
  at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2740)
  at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2732)
  at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
  at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
  at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
  at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2732)
  at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2353)
  at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2352)
  at scala.Option.foreach(Option.scala:257)
  at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2352)
  ... 62 more

Hope this helps.

 

Cheers,

Ben

Expert Contributor

Thanks @Benassi10 for providing the context. Much appreciated.

 

We are discussing this internally to see what can cause such issues. One theory is that we enabled support for Spark Lineage in CDS 2.3 and if the cm-agent doesn't create /var/log/spar2/lineage directory (for some reasons) you can see this behaviour. If lineage is not important, can you try running the shell with lineage disabled?

 

spark2-shell  --conf spark.lineage.enabled=false

 

If you don't want to disable lineage, another workaround would be to change the lineage directory to /tmp  in CM > Spark2 > Configuration > GATEWAY Lineage Log Directory > /tmp , followed by redeploying the client configuration.

 

Let us know if the above helps. I will update the thread once I have more information on the fix.

New Contributor

After I changed the directory to / tmp, I verified that spark 2.3 works normally.

 

Is there any possibility of a new release of Spark 2.3?

Expert Contributor

Thanks, Lucas. That's great to hear!

Can you please check if toggling it back to  /var/log/spark2/lineage followed by redeploying the client configuration helps too?

 

As promised, once the fix is identified I will update this thread. 

Explorer

I got it to work too by changing the directory to /tmp, but when I changed it back to /var/log/spark2/lineage, the error came back. So, I created the directory manually. I modified the spark2 and lineage directories to be owned by spark:spark and modified the lineage directory to be writeable by all (rwxrwxrwx) with a sticky bit (t). After doing this, the error goes away.  

Expert Contributor

Cool. I will feed it back in the internal Jira we are discussing this issue for.

Thx for sharing.

Expert Contributor

Just wanted to complete the thread here. This is now documented in the known issues section of the Spark2.3 documentation followed by workarounds to mitigate the error. Thx.

 

https://www.cloudera.com/documentation/spark2/latest/topics/spark2_known_issues.html#concept_kgn_j3g...

 

 

In CDS 2.3 release 2, Spark jobs fail when lineage is enabled because
Cloudera Manager does not automatically create the associated lineage
log directory (/var/log/spark2/lineage) on all required cluster hosts.

Note that this feature is enabled by default in CDS 2.3 release 2. Implement one of the following workarounds to continue running Spark jobs. Workaround 1 - Deploy the Spark gateway role on all hosts that are running the YARN NodeManager role Cloudera Manager only creates the lineage log directory on hosts with Spark 2 roles deployed on them.
However, this is not sufficient because the Spark driver can run on any host that is running a YARN NodeManager.
To ensure Cloudera Manager creates the log directory, add the Spark 2 gateway role to every cluster host that is running the YARN NodeManager role. For instructions on how to add a role to a host, see the Cloudera Manager documentation: Adding a Role Instance Workaround 2 - Disable Spark Lineage Collection To disable the feature, log in to Cloudera Manager and go to the Spark 2 service.
Click Configuration.
Search for the Enable Lineage Collection property and uncheck the checkbox to disable lineage collection.
Click Save Changes.

 

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.