One of the bigger architectural decisions when designing an #NG9-1-1 implementation is where the NGCS (central services) go, such as the ECRF, LVF, ESRP and bridge. I’ve been consulting on state-wide NG9-1-1 deployments, and have seen proposals from most of the vendors in the space. I see the following choices:
- State-specific nodes located in state, or in some cases, in an adjacent state
- Shared nodes in vendor operated data centers
- Shared nodes in public cloud sites
In the latter, while I believe the implementations I’ve seen actually do share the functions, it would be easy, and I think slightly better, to have separate instances per state. I’ve seen a proposal that shared one site between two states with a second, separate in-state site for each state, but I believe the vendor had separate instances on common hardware for each state in the shared site. I think that’s a wise choice.
The advantage to a state-specific instance, whether it’s in an in-state site or a public cloud site is that a failure at one state would not necessarily affect the system in another state. I say “necessarily” because it depends on the failure. In the two adjacent states sharing a site, with separate instances, a hardware failure could affect both sites. In a public cloud, separate instance, a failure in the public cloud instance that affected the region (which is not uncommon) would affect all the states served by that region. Most of the more recent 9-1-1 failures affected more than one state, so this is a very important consideration. I lean heavily towards an in-state solution.
The advantages to shared instances of any flavor are, of course, mostly cost savings, and the price should reflect that. If the data center is a major hub for the vendor, and the vendor maintains staff 24/7 in that data center as well as a full complement of spares, then the Mean Time To Repair, which is an important, but often overlooked, component of availability (“nines”) is significantly lower. Sometimes, I see one site with on-site technicians, but other sites are remote, and take multiple hours for parts and technicians to arrive. That negates the advantage. State-specific sites most often have techs hours away, and few if any on-site spares. Vendors can fix that, but .. cost.
It’s all in the MTTR numbers, which vendors should understand, and customers should inquire about. MTTR in a public cloud site is a complex subject. The hardware, and the effect of a hardware failure on a running system, looks to the customer as being really, really high availability. However, the incidence of entire region failure, which is almost always a software issue, is so common, that it tends to wipe out the advantage of the apparent advantageous hardware MTTR effect.
The thing that makes me much less comfortable with out-of-state data centers is tromboning of media, and to a lesser extent, signaling. If you have a local caller, on a local originating service provider, to a local PSAP, the spatial distance between where the call comes from and goes to (the PSAP) can be a small number of miles. To be sure, mobile networks tend to be more regionalized, so the call may actually be handled a ways away from both ends, but backhauling a local call to a local PSAP back to a data center multiple states away is not good.
The effect of tromboning is actually fairly mild. Some extra delay in the media path. It’s typically only a few states (maybe a New England call to an Atlanta data center), not all the way across the country, except in extreme circumstances. However, as anyone who has worked with me on NG9-1-1 system design knows, I’m always thinking about how our systems work in disasters. Think Katrina, Loma Pietra, Sandy or 9/11. We know that sometimes we get islands of connectivity. So there might be an IP path working between the caller and the PSAP, but not if it has to go back to Atlanta. I really think at least one close-by data center is a better design choice.
On the other, other hand, having all the data centers in one geographic area means that a single large weather or other natural event can take them all out. That means you really, really want at least one site that is nowhere near you. Yes, that event could take out enough network that it wouldn’t matter anyway, but if you have a robust enough network design, with excellent path, vendor and technology diversity, it might work anyway. IP networks are the most reliable networks in disasters based on experience, if they are designed with diversity in mind. Cost, of course, interferes.
So, ideally you would have two or three data centers in state, one or two across the country, and maybe one in an adjacent state. I have an article coming in NENA’s The Call about availability, but one take-away is that for all the NG9-1-1 system designs I know about, two sites is not enough to get 5 nines. You need more. 5 or 6 is a good number 🙂
So where does this leave a state looking to deploy NG9-1-1? Well, as always, it depends. If a vendor proposes a shared system, at least one of its data centers is very close to you, and the cost is significantly lower, then I think that’s a good set of tradeoffs. If there is no significant cost difference, another vendor is offering an in-state solution, and it has a reasonable MTTR plan, then I’d go that direction. Of course this is one of many design decisions that a state would consider when selecting a vendor, so we’re really talking about a score component rather than an actual vendor decision.
Of course, I think there are better ways to do this that address all these issues, but states have to choose among what vendors actually have available.
See my first post to find out more about me, and my point of view.