Support Questions

Find answers, ask questions, and share your expertise

Spark on Yarn - Unexpected end-of-input: was expecting closing quote for a string value

avatar
Super Collaborator

Hi all, after recently upgrading to CDH 5.11 I get tons of the following "Unexpected end-of-input" log entries related to "SPARK" (running on YARN) and classified as "ERRORS".

 

I'm experiencing malfunctionings (failed Oozie Jobs) and I believe they are related to these errors, so I'd really like to solve the causing issue and see if the situation gets any better.

 

In the logs, "source" is:

 

FsHistoryProvider

And "message" is:

 

Exception encountered when attempting to load application log hdfs://xxxxx.xxxxx.zz:8020/user/spark/applicationHistory/application_1494352758818_0117_1
com.fasterxml.jackson.core.JsonParseException: Unexpected end-of-input: was expecting closing quote for a string value
 at [Source: java.io.StringReader@1fec7fc4; line: 1, column: 3655]
	at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1369)
	at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:599)
	at com.fasterxml.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:532)
	at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString2(ReaderBasedJsonParser.java:1517)
	at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString(ReaderBasedJsonParser.java:1505)
	at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:205)
	at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:20)
	at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:42)
	at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:35)
	at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:28)
	at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:42)
	at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:35)
	at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:42)
	at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:35)
	at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:2888)
	at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2034)
	at org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:19)
	at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:44)
	at org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:58)
	at org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(FsHistoryProvider.scala:583)
	at org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$16.apply(FsHistoryProvider.scala:410)
	at org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$16.apply(FsHistoryProvider.scala:407)
	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
	at org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$mergeApplicationListing(FsHistoryProvider.scala:407)
	at org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$checkForLogs$3$$anon$4.run(FsHistoryProvider.scala:309)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)

Any suggestions/ideas? Thanks!

1 ACCEPTED SOLUTION

avatar
Champion
Based on some SO post, the exception is most likely related to some invalid JSON somewhere. This si the Spark History server though and I cannot think of any json files it would be using on a regular basis.

On mine I see a redaction-rules.json. Are you using redaction?

Oh wow, I think it was staring at us in the face. It is trying to read a specific application log which has invalid JSON characters. Read that file and put its output into a JSON validator to see what is invalid. I would save it somewhere so it can be review again if needed. Then remove it and try to run the job again. If it fails again, then something is causing it to create the invalid JSON in the application log.

View solution in original post

5 REPLIES 5

avatar
Super Collaborator

Additional info. If I run spark CLI (where my spark procedures are working, btw, differently from when they are launched in Oozie), as soon as I try to define a Dataframe I receive the following warning that I've never seen before the upgrade:

 

In [17]: utenti_DF = sqlContext.table("xxxx.yyyy")
17/07/19 15:48:58 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.1.0
17/07/19 15:48:58 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException

Anyway, as I repeat, from CLI things work. I just thought this could be relevant

avatar
Champion
I am not sure about the first portion but the second bit seems to indicate that you did not upgrade your Hive schema during the upgrade process. You will need to stop Hive. In CM, go to the Hive process, the Action menu should have an Upgrade Hive Metastore Database Schema option. Run that to see if it clears up the warning.

avatar
Super Collaborator

Hi mbigelow, I've tried what ypu suggested (stop hive + running the action from the dropdown menu).

 

Process was successful, but the warning in spark CLI is still there...

avatar
Champion
Based on some SO post, the exception is most likely related to some invalid JSON somewhere. This si the Spark History server though and I cannot think of any json files it would be using on a regular basis.

On mine I see a redaction-rules.json. Are you using redaction?

Oh wow, I think it was staring at us in the face. It is trying to read a specific application log which has invalid JSON characters. Read that file and put its output into a JSON validator to see what is invalid. I would save it somewhere so it can be review again if needed. Then remove it and try to run the job again. If it fails again, then something is causing it to create the invalid JSON in the application log.

avatar
Super Collaborator

Thanks mbigelow, following your suggestions I solved the massive error logging issue.

 

I've processed in a Json validator the specific log file referenced in the Java stack trace:

 

/user/spark/applicationHistory/application_1494352758818_0117_1

 

But the format was correct, according to the validator. So I just moved it away in a temporary directory. As soon as I did it, the error messages stopped clogging the system logs. So it was probably corrupted in a very subtle way... But it was definitely corrupted

 

That Json file has been indeed generated by the Spark Action that is giving me problems, but it was an OLD file. New instances of that Spark Action are generating new Json logs, but they are not giving any troubles to the History Server (stopped having tons of exceptions logged as I just said)

 

Unfortunately, the Spark job itself is still failing and it's needing further investigation on my side, so apparently this is not related to that specific error message.

 

But I've solved an annoying problem, and at the same time I have cleared out the possibility of the Spark Action issue being related to that java exception

 

Thanks!