This problem has been happening on our side since many months as well. Both with Spark1 and Spark2. Both while running jobs in the shell as well as in Python notebooks. And it is very easy to reproduce. Just open a notebook and let it run for a couple of hours. Or just do some simple dataframe operations in an infinite loop. There seems to be something fundamentally wrong with the timeout configurations in the core of Spark. We will open a case for that as no matter what kind of configurations we have tried, the problem insists.
... View more
I agree with @cduby that there is a version conflict between the used hadoop library and what Spark is actually expecting. The best way to find such a problem is to use the dependency:tree ability of Maven in combination with the artifact that contains the problematic class. In this way, you can find which transitive dependencies are getting fetched by your Spark application by default. So, I had exactly the same problem and in order to solve it I followed the following process. Find in which artifact the org.apache.hadoop.util.StringUtils class belongs to. This is the hadoop-commons library. Then execute mvn dependency:tree to find out what version of this jar is fetched by default by Spark (note that the automatic dependency resolution happens only in the case that you haven't already provided the hadoop libraries yourself. In my case, these were the 2.2 versions of Apache hadoop-common. Then find the right version of the library that contains the correct version of StringUtils. This can be quite difficult but in my case, I happened to know it from other projects and this was the 2.6.1 version. Provide that dependency in your pom.xml, before the definition of the Spark dependency, so that it takes precedence over the transitive dependency of Spark. Then it should work. The following hadoop-common dependency solved the problem for me. <dependency>
<!-- Spark dependencies -->
... View more