I'm running a Spark streaming that uses a direct connection to Kafka on HDP 2.3.4
Unlike other jobs, for this one the driver log is filled with constant warnings like this:
6/02/15 16:02:22 WARN history.YarnHistoryService: Discarding event
I haven't see that class (org.apache.spark.deploy.history.yarn.YarnHistoryService in the standard Spark source code, so I wonder if it's an HDP thing.
I'm going to suppress those warnings to avoid filling up logs but I would appreciate any hints at what can be wrong or how to prevent it.
So this is the Timeline Server used by Spark for logging the application progress. Do you have Timeline server disabled?
From the code:
- If the timeline service is disabled, that is `yarn.timeline-service.enabled` is not +`true`, then the history will not be published: the application will still run.
- Similarly, in a cluster where the timeline service is disabled, the history server +will simply show an empty history, while warning that the history service is disabled.
- In a secure cluster, the user must have the Kerberos credentials to interact +with the timeline server. Being logged in via `kinit` or a keytab should suffice.
If you don't want to use or fix timeline server you might be able to disable logging to it by changing this:
spark.yarn.services ( however ambari doesn't like me to completely remove it so there might be a need to remove it using the config settings, the code also says you can just misspell it )
Thanks for your indications.
The timeline service is enabled (I have almost all setttings by default) and working fine for other regular Spark jobs (haven't tried running other streaming jobs in this cluster).
I don't have the cluster kerberized.