Member since
03-23-2016
21
Posts
5
Kudos Received
1
Solution
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 7025 | 08-12-2016 12:05 PM |
04-06-2017
12:07 PM
1 Kudo
This problem has been happening on our side since many months as well. Both with Spark1 and Spark2. Both while running jobs in the shell as well as in Python notebooks. And it is very easy to reproduce. Just open a notebook and let it run for a couple of hours. Or just do some simple dataframe operations in an infinite loop. There seems to be something fundamentally wrong with the timeout configurations in the core of Spark. We will open a case for that as no matter what kind of configurations we have tried, the problem insists.
... View more
08-12-2016
12:05 PM
1 Kudo
I found the cause of the problem. it's configuration matter. in fact namenode was installed on master01 but following parameter was set with worker02 (on which no namenode) : dfs.namenode.http-address: worker02.cl02.sr.private:50070 instead of master01.cl02.sr.private:50070 the configuration was altered because the cluster was taken to HA configuration then taken back to non HA. then one of the namenodes was deleted (the one on worker02) without paying attention that the remaining configuration was pointing to worker02. hope I'm clear 🙂
... View more