Member since
06-16-2016
5
Posts
0
Kudos Received
0
Solutions
06-22-2016
12:02 AM
Thanks for the response. There are a number of reasons why you wouldn't want a restart on the same node. In our case, a file was corrupted and was repeated causing problems. You could also encounter the issue if you ran out of disk space or had other process dependencies that were failing for some reason.
... View more
06-17-2016
03:05 PM
Thanks for the response. I understand this behavior, but am wondering how to get out of the situation as the worker simply sits there restarting. Is there anything that can be done to stop/delay the heartbeat as it would seem that the restart is happening fast enough to keep nimbus from redirecting the worker to a different node where the worker could run properly.
... View more
06-16-2016
11:41 PM
Does anyone have any suggestions on how to debug netty failures in storm. We are hitting these regularly and cannot seem to determine the cause. It would be helpful if the netty error output was more than a statement that something failed and actually included a statement that there was a timeout, the destination couldn't be found, etc. These sorts of things are pretty standard for any routing errors.
... View more
Labels:
06-16-2016
11:34 PM
The netty error reporting in storm is very poor as all it tells you is that it cannot connect to something at a port or that it is dropping a message. This is problematic when trying to identify what is causing connection failures or lost messages.
... View more
06-16-2016
04:04 PM
We are encountering a situation where a storm worker dies during processing due to a sporadic problem on the node. Unfortunately, the worker process is repeatedly relaunched witht he same topology on the same node where the problem has occurred instead of being launched elsewhere. There are plenty of slots so we would expect the system to realise that the topology is failing there and to place the topology components else. Any suggestions on what to look at?
... View more
Labels: