Telecom operators have strict requirements for their services, often referred to as “carrier-grade” requirements. This includes the vaunted “five 9s” of availability–a service that’s available 99.999% of the time (roughly 5.5 minutes of downtime a year). Recognizing that fault management wasn’t native to cloud computing applications, such as OpenStack, the Doctor project focuses on fault management and recovery, ensuring that applications come back to a fully redundant configuration faster than before.
Carrier-grade high availability support may be inbuilt or provided by a platform, but the key requirement is very fast detection and reaction time to minimize service impact. The Doctor project focuses on fault management and recovery, ensuring that applications come back to a fully redundant configuration quickly.
Doctor is a fault management and maintenance framework for high availability of network services on top of virtualized infrastructure. Doctor features immediate notification of a wide range of failure events from the NFV Infrastructure (NFVI), and supports orchestration for virtual network functions (VNFs) recovery. This is provided through immediate notification to the Virtualized Infrastructure Manager (VIM) when infrastructure is unavailable. As with all OPNFV projects (and in the spirit of NFV itself), Doctor is driven by multiple vendors and service providers. Doctor actively collaborates with the ETSI NFV ISG and upstream open source projects like OpenStack.
Without Doctor, the delay to notify the application manager in OpenStack was several minutes–a situation that could disconnect thousands of mobile subscribers. With doctor, network operators can perform such failure notification and recovery within one second. The live keynote demo from OpenStack Summit in Barcelona showed how well Doctor performs in failure detection and notification–keepin a call connected even after cut lines in the data center– and meeting the high availability requirements in for NFV.
The Doctor team has now contributed failure event collection and immediate notification features in OpenStack Liberty, Mitake, and Newton. Anyone looking for immediate alarming can now leverage these contributions submitted and accepted in OpenStack. Doctor is also involved in collaboration planning with other existing OPNFV projects (e.g., the Software Fastpath Quality Metrics project–Barometer) aiming to build up an integrated platform for NFV. These activities will demonstrate further open collaboration across OPNFV and upstream projects.