When designing its new OSS, BT followed three principles which we explore in this article: self-service (for network engineers), digital twin (of the network), and single source of truth (of network and service information).
Omdia view
Summary
In recent years, BT Group, the UK’s leading provider of fixed and mobile telecom services, has completely rebuilt large parts of its operations support system (OSS) estate, enabling it to reduce cost and increase operational agility. When designing the new OSS, BT’s software engineers followed three key principles, which we explore in this article: self-service (for network engineers), digital twin (of the network), and single source of truth (of network and service information).
Introduction
We have written in the past about how BT has been leveraging open-source software to modernize its OSS estate (see Further reading). We caught up recently with Ravi Ramachandran, Digital OSS Director at BT, to see how things have evolved.
Ramachandran is responsible for building next-generation software tools that help plan, build, design, and manage BT’s fixed network infrastructure and clouds that support consumers and businesses. When Ramachandran was tasked with modernizing BT’s OSS, the temptation was to make the OSS team’s life easier by simply consolidating the large number of antiquated IT systems they had accumulated over the years. Instead, they took a more targeted approach that would make life easier for their colleagues who plan, build, and manage the networks. This approach would ultimately enable BT to deliver a better customer experience, even if it meant more work for the OSS team.
In order to fully align the OSS team, they were moved from the IT organization to become part of the network team under the chief network officer (CNO). This made collaboration between software engineers and network engineers a lot easier. The alignment also fitted with the trend of networks becoming more programmable and software-driven.
Over the years, BT had developed a plethora of OSS tools, but none could support a software-driven way of working. There were too many hand-offs between systems. Tools were very rigid, capable of only doing certain things and the data that these tools consumed was very siloed.
Ramachandran and his team decided to build new OSS tools using open source. Applying the concept of NetDevOps, they set about creating low-code platforms that would enable network designers to deploy “infrastructure as code.” This software-driven approach would, in turn, allow engineers to build dynamic networks “as a service” for BT’s customer-facing units (consumer and business).
BT’s OSS team followed three key principles when designing the new OSS: self-service, digital twin, and single source of truth.
Self-service OSS accelerates pace of network change
Traditionally, when a new piece of infrastructure was deployed in the network, the process had many steps and took several months. BT would work with its equipment vendors to agree on device configurations primarily in the form of PDFs and spreadsheets, which would then be passed to the OSS teams, who would need to write code to update their systems to accommodate the new network devices. This required multiple handoffs across teams and systems.
Today, BT has a NetDevOps approach. Their network engineers use platforms designed by the OSS team to introduce network devices in a standardized way.
A key platform is the unified catalog for service and resource management. The catalog contains specifications and configuration data for network infrastructure based on YANG data models from network equipment vendors. The models can be edited by BT’s network engineers if necessary. Lifecycle management and resource planning are defined with TOSCA templates, which add modeling constructs to the catalog in support of orchestration. For example, resource design templates are used for network instantiation and business process workflow orchestration. An example of a business process workflow is a capacity planning exercise, which triggers an automated plan-and-build activity.
Another key platform is BT’s next-generation activation engine (NGAE). NGAE YANG models are used with TOSCA templates to provision network resources and orchestrate services over hybrid networks (fixed, mobile, etc.).
Unified catalog and NGAE are directly accessible by network engineers, which means that the OSS software designers are no longer a delaying factor in allowing network engineers to make changes to the network. A CI/CD pipeline allows the network engineers to test changes and deploy new configurations to the live network.
This has reduced the time to market (TTM) for new services from many months to a few weeks. It has also given opportunity to upskill both OSS and network teams. As the boundaries between software and network engineers disappear, they are able to cooperate more closely.
Digital twin improves network utilization
BT’s OSS team has applied the concept of digital twins to the network. Network planning and optimization engineers can see a 3D replica of a chassis in a local exchange (central office) that tells them what ports are in use and what traffic is running on the switch. This allows an optimization engineer to spot opportunities to move services from one network node to another, removing redundant kit and saving on electricity consumption.
Visualizing the physical exchange, including space, layout, power, and cooling, enables more efficient capacity planning. For example, understanding if there is sufficient capacity to serve an enterprise customer on a network node can save a site visit from an engineer. And enabling engineers to make configuration change requests directly from the digital twin tool saves network planners from a lot of tedious manual tasks.
Network lifecycle management involves defining policies and rules for instantiating physical, logical, and virtual network resources. This process is facilitated by TOSCA orchestration templates that provide an intuitive way to express and design network configurations. At the design stage, specific aspects such as location management, device planning, and rack planning are captured in a resource design template, guiding network instantiation during runtime. This comprehensive approach ensures efficient and reliable network operations.
Single source of truth solves major pain point
According to Ramachandran, inventory is the crown jewel in OSS but also a major pain point. BT had multiple inventory systems that were sometimes disjointed and isolated.
As such, BT’s OSS team developed a Service and Resource Inventory Management System (SRIMS). This single inventory system is comprised of multiple information layers:
- Building and support infrastructure.
- Floor plans.
- Power & cooling.
- Racks.
- Physical network infrastructure.
- Logical network elements.
- Virtual network constructs.
- The services that the network supports.
SRIMS also adopted a unified data model based on the TM Forum’s shared information and data model (SID), as shown in Figure 1.
Figure 1: BT’s data model for network inventory
Source: BT
According to Sreenath Gopalakrishna, Technology Director at BT, the inventory system that his team developed was able to capture information (such as power, space, and cooling) that is not always available in off-the-shelf inventory systems. According to Gopalakrishna, many vendors and systems integrators that BT talked with were skeptical that a unified inventory system could be achieved. Nonetheless, with SRIMS, BT has successfully consolidated information from various inventory sources. This has reduced data errors, simplified the future evolution of the network, and enabled BT to shut down hundreds of legacy systems over the last four years.
SRIMS uses a graph database—Neo4j. A graph is a natural representation of a network comprised of nodes, relationships, and properties. A graph is better for representing networks than traditional relational databases. Among other things, it enables faster querying and can facilitate the finding of shortest paths.
Gopalakrishna found that BT’s network could be easily modeled in a graph database using the TM Forum SID data model. In total, the SRIMS database has over 1 billion vertices (objects), each with an average of 30 properties.
When evaluating graph databases, BT looked at the support for different data models, licensing terms, the vibrancy of the open-source community, and the ability to host the database in private cloud (not just public cloud). Other factors included usability (administration user interface, data migration pipelines, etc.), the availability of connectors to other databases and tools, ease of upgrades, horizontal scalability, performance under stress, stability under user mistakes (e.g., long-running query), and recovery time after a crash. Neo4j scored highest among the graph databases that BT considered.
SRIMS enables plan-and-build, service design, and provisioning for various BT products, including enterprise Ethernet and wholesale broadband. It typically handles over 5,000 order requests per hour and can handle close to 500,000 product availability requests per day. The system has over 1,000 users across BT.
Conclusion
BT’s OSS transformation has allowed its network engineers to launch network changes more quickly, reducing the TTM for new services. It has provided better visibility into network capacity and the ability to predict traffic, which allows for more just-in-time capex additions, thereby improving asset utilization. It also reduces the need for engineers to make site visits since they can get the information they need remotely.
By bringing the OSS team into the same organization as networking, there has been a better understanding between software engineers and network engineers. Software engineers have gained a better understanding of networks, and network engineers have gained an appreciation of IT concepts such as data modeling (e.g., YANG), service automation (e.g., TOSCA), and DevOps tooling (e.g., CI/CD). It has been an enriching experience for both disciplines.
As networks continue to evolve, customers become more demanding, and competition becomes more intense, operators must enhance their OSS to keep pace. By staying ahead of the curve, the OSS can effectively meet the dynamic demands of modern networks, ensuring seamless service delivery and enabling operators such as BT to maintain a strong position in the market and drive sustainable growth.
Appendix
Further reading
“How BT developed its own next-generation OSS” (August 2021)
Author
James Crawshaw, Practice Leader, Service Provider Transformation