> Now: that same European instance goes down. Proxies in Asia need to know about that, right away, and this time you can't afford to wait.
But they have to. Physically no solution will be instantaneous because that’s not how the speed of light nor relativity works - even two events next to each other cannot find out about each other instantaneously. So then the question is “how long can I wait for this information”. And that’s the part that I feel isn’t answered - eg if the app dies, the TCP connections die and in theory that information travels as quickly as anything else you send. It’s not reliably detectable but conceivably you could have an eBPF program monitoring death and notifying the proxies. Thats the part that’s really not explained in the article which is why you need to maintain an eventually consistent view of the connectivity. I get maybe why that could be useful but noticing app connectivity death seems wrong considering I believe you’re more tracking machine and cluster health right? Ie not noticing an app instance goes down but noticing all app instances on a given machine are gone and consensus deciding globally where the new app instance will be as quickly as possible?
A request routed to a dead instance doesn't fall into a black hole: our proxies reroute it. But that's very slow; to deliver acceptable service quality you need to minimize the number of times that happens. So you can't accept a solution that leaves large windows of time within which every instance that has gone down has a stale entry. Remember: instances coming up and down happens all the time on this platform! It's part of the point.
But they have to. Physically no solution will be instantaneous because that’s not how the speed of light nor relativity works - even two events next to each other cannot find out about each other instantaneously. So then the question is “how long can I wait for this information”. And that’s the part that I feel isn’t answered - eg if the app dies, the TCP connections die and in theory that information travels as quickly as anything else you send. It’s not reliably detectable but conceivably you could have an eBPF program monitoring death and notifying the proxies. Thats the part that’s really not explained in the article which is why you need to maintain an eventually consistent view of the connectivity. I get maybe why that could be useful but noticing app connectivity death seems wrong considering I believe you’re more tracking machine and cluster health right? Ie not noticing an app instance goes down but noticing all app instances on a given machine are gone and consensus deciding globally where the new app instance will be as quickly as possible?