Release notes
Maturity Level | Alpha |
---|---|
Release Status | Pre-production |
Version | x.x.x |
Team Approved Date | dd-Mmm-YYYY |
Approval Status | Draft |
IPR Mode | RAND |
Document Type | Use Case |
Notice
Copyright © TM Forum 2025. All Rights Reserved.
This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published, and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this section are included on all such copies and derivative works. However, this document itself may not be modified in any way, including by removing the copyright notice or references to TM FORUM, except as needed for the purpose of developing any document or deliverable produced by a TM FORUM Collaboration Project Team (in which case the rules applicable to copyrights, as set forth in the TM FORUM IPR Policy must be followed) or as required to translate it into languages other than English.
The limited permissions granted above are perpetual and will not be revoked by TM FORUM or its successors or assigns.
This document and the information contained herein is provided on an "AS IS" basis and TM FORUM DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY OWNERSHIP RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Direct inquiries to the TM Forum office:
181 New Road, Suite 304
Parsippany, NJ 07054 USA
Tel No. +1 862 227 1648
TM Forum Web Page: www.tmforum.org
1. Introduction
This Use Case focuses on Assurance of a network scenario incorporating multiple network technology domains:
- where several of the domains have active components that are self-managing and self-healing ( Proactive/ Predictive Maintenance) ,
- where a fault occurs in the physical network realizing connections supporting multiple OSI connectivity layers and multiple technology domains. (Reactive Maintenance).
This SD-WAN Use Case is inspired by:
- High value use cases proposed to the AN Self-Healing Domain Team: IG1373 AN Use Cases: A Guide to Self-Healing and Closed-Loop Automation v1.2.0 Access Domain use case using SDWAN services.
also to be published as IG1373B. - A fibre break /faulty optical line Termination (OLT) use case 5.2.1.1. NaaS Service Intent Assurance : Fibre break TR313C ODA (Production) Components for NaaS Evolution v1.0.0 – TM Forum
IG1373B identified a high value use case comprising a cross domain assurance capability that establishes network health; and probable causes of network faults/impairment, in the context of network technologies
supporting several layers of the OSI stack, (rather than focusing solely on concatenation of connectivity at a single layer in the OSI Stack).
The key benefit identified in this self healing use case is:
- Whilst a network failure within Self Healing Domains does not lead to a immediate customer service failure, it is easy to think that no assurance actions are required by Operations staff as the customer service is maintained.
- But in the case of a fault: such as failed router, failed optical transport or a fibre cut due to a backhoe, these physical conditions do need to be repaired as they impair the Network Heath e.g resilience and availability They impact and potentially extend
Mean Time to Repair(MTTR) of a subsequent fault that cannot be remediated and hence directly affect an end customer SLA.
The opportunity is to move from expensive reactive maintenance actions to repair a customer service, to a planned predictive approach, where there are enhanced options to schedule and deploy fewer staff and truck rolls and drive up operational efficiency.
1.1. Context or Background
This Use Case focuses on the determination of network health and Probable Cause Analysis aka Root Cause Analysis (RCA) involving multiple Multi-Technology Domain Managers, Multiple Operational Domains (aka Autonomous Domains)
and making a Next Best Repair/Remediation Action recommendation.
For this Use Case the cause is physical line plant failure between Technology Domains such as a Fibre Breaks caused by a backhoe damaging the physical ducts and cables.
Noting that in the access network there is less likelihood of standby or duplicated paths as compared to core network where duplication is more prevalent. So in this example there is a mixture of self-healing and non self-healing domains working together.
1.2. Objective of the use case
When a physical infrastructure fault i.e. line card or fibre occurs with a mixture of self-healing and non-healing network domains:
- How do Operations staff determine that there is an outage/ impairment to the Network Health?
I.e. what kind of observability and metrics need to be available to Operations staff? - How do they determine who, what, where & when to repair the fault?
I.e. what are the operational procedures and mechanisms for using this information?
Currently there is limited experience in operating self-healing networks with non-healing networks. Hence the best practices have yet to be established and will evolve over time.
1.3. Scope and assumptions
1.3.1. Scope
IG1373 AN Use Cases: A Guide to Self-Healing and Closed-Loop Automation v1.2.0 shows that many practical examples of high value assurance use cases are dependent on linking inventories of physical resources (OSI Layers L0/L1) to the inventories of
configuration and state of active equipment and software ( operating at OSI L2/3); and possibly to sources external to the network.
Self-managing Domains based on Controller concepts generally operate at L2/L3 layers. For L0/L1 physical domain self-healing is usually limited to core networks, as self-healing in access network is prohibitively expensive (CAPEX).
The linkage of information in inventories is part of creating a digital twin representation - a specialized form of a Topology Graph - that records the relationships among inventory information and allows for queries that navigate the relationships both within and between inventories for: Infrastructure L0, Transport L1, Logical L2/L3 and potentially L4 of the network.
Intrinsically networks are layered, and the challenge is to discover and establish relationships between assets at these different network layers to support relating onserved symptoms to causes of netwrk impairments/incidents.
This is illustrated in the IG1373B AN Use Cases: Network Quality Optimization & Fault Management v1.0.0 DRAFT (ANP-1236)
Fig 1.1 IG1373 Use Case Access Domain section 3.3.8 underlay network
This shows the layered nature of network illustrated as an overlay link e.g. UCS in SDWAN, that is dependent on an IP Transport underlay e.g. MPLS/ IP Transport.
Similarly, there are further layers of dependency e.g. Fibre Transport layer and Physical Infrastructure.
It is frequently presumed that only the network topologies and network data /telemetry feeds are needed for automated self-healing. But in many real-world use cases, additional sources of data information and knowledge are needed, for example power company information, civil emergency information, social events, metrology information and more.
An example of layering may also emerge when considering the ServCo NetCo/InfraCo split:
Fig 1.2 Example of layering between ServCo and NetCo/InfraCo for SD-WAN service
In the following examples we assume use of a mixture of MEF LSO API data models realized using TM Forum Open APIs, and interfaces realized by lower level Coordinators using typically IETF/ BBF RestConf based interfaces, and Yang based data models.
The positioning of this SDWAN exemplar to IG1224 NaaSOperational Domains and TR313C ODA Production Components is:
Fig 1.3 Mapping of exemplar to IG1224 NaaS Domain model
The NaaS Service exposed by ODA Production is a SDWAN Service based on NaaS APIs and using a MEF SDWAN Service/attributes as API Payload. In this SDWAN exemplar the service is exposed by a configured instance of a Service Management Intelligent Controller (SMIC) Component.
The IP (Underlay) Service is based on a technology Domain Controller exposing TM Forum APIs using MEF IP Service/Attributes Payload. In practice, vendor provided controllers may natively come with functionally equivalent RestConf Yang interfaces, and some form of protocol adaption may be required, possibly using a Operational Domain specific SMIC Controller. This model is chosen as simpler, and avoids over complicating the Use Case Flow Scenario with incidental detail.
The Fibre and Line Plant Infrastructure is assumed to be entirely passive so that operational domain is supported by Inventories based on TMFC012 Resource Inventory with schema extensions to support each technology domain.
Other functionally equivalent use cases could be constructed but the assurance interaction patterns between Components ( both ODA and NEP Networking Components) would be essentially unchanged.
1.3.2. Assumptions
This Use Case assumes that:
- Practical CSP operational scenarios will comprise of a mixture of self-healing domains and traditional non self-healing domains (Proactive and Reactive Maintenance).
- The Operational Domain Model from IG1224 NAAS corresponds to one form of self-healing Autonomous Domain.
1.3.3. References
- IG1373 AN Use Cases: A Guide to Self-Healing and Closed-Loop Automation v1.2.0 Access Domain use case using SDWAN services.
- Updates in IG1373B AN Use Cases: Network Quality Optimization & Fault Management v1.0.0 DRAFT (ANP-1236)
- TR313C ODA (Production) Components for NaaS Evolution v1.0.0 – TM Forum
- NaaS Transformation v13.0.0 (IG1224) – TM Forum
-
TR255A Connectivity Patterns for Virtualization Management v4.0.1
- RFC 8345 - A YANG Data Model for Network Topologies
- RFC 9417: Service Assurance for Intent-Based Networking Architecture (SAIN)
- idraft-ietf-nmop-network-incident-yang-03 - A YANG Data Model for Network Incident Management
- idraft-ietf-nmop-network-anomaly-architecture-02 - A Framework for a Network Anomaly Detection Architecture
- draft-ietf-nmop-simap-concept-03 - SIMAP: Concept, Requirements, and Use Cases (SIMAP Service & Infrastructure Maps)
- RFC 8348 - A YANG Data Model for Hardware Management
- A YANG Data Model for Augmenting VPN Service and Network Models with Attachment Circuits
- RFC 9182: A YANG Network Data Model for Layer 3 VPNs
- RFC 8466 - A YANG Data Model for Layer 2 Virtual Private Network (L2VPN) Service Delivery
- RFC 9291 - A YANG Network Data Model for Layer 2 VPNs
- TR-454: YANG Modules for Network Map & Equipment Inventory
- ietf.org/archive/id/draft-ietf-nmop-terminology-16.txt
2. Description
2.1. IG1373B SD-WAN Use Case
This Use Case is focused on remediation and recovering from a physical Fibre Break or OLT failure between logical equipment in multiple locations, operating at multiple OSI levels, where different equipments are under several Operational Domain
governance regimes. This corresponds to the typical operational situation in most CSPs.
This proposal is derived from the SDWAN Use Case in the Self-healing Use Case document:
IG1373B AN Use Cases: Network Quality Optimization & Fault Management v1.0.0 DRAFT (ANP-1236)
The goal is to correctly identify the next best remediation action and achieve both:
- Minimum number of field force interventions and truck rolls.
- Reduce the Mean Time to Repair/remediation which improves network Health e.g. resilience and availability.
The use case in IG1373B created the scenario below which has been enhanced with:
- Proposals for exemplar Autonomous Domain boundaries,
- Addition of the proposed ODA Component Service Management Intelligent Controller (SMIC) that creates a customer facing end to end service management view derived from the collection of technology domains e.g. SDWAN Controllers, IP /MPLS underlay underpinned by fiber controllers, physical and infrastructure. This concept of e2e Service Management Operational Domain/Manager comes from IG1224 NaaS Operational Domain concept.
Figure 2.1 IG1373 Self-healing SDWAN Use Case
The SM Intelligent Controller works directly with each Individual Layer of the network, some working in reactive assurance mode, and some in predictive /proactive assurance mode.
The common requirement is observability of the state of health of the constituent Operational and Technology Domains. This aligns with current analysis and proposals in the IETF NMOP initiative.
Use of Intent management interfaces means the provisioning process will operate on a Standardized service model but the assurance interactions may report at both a Service level for Health, and Resource level with impaired resources together with links to the impacted Service entities. This information allows SMIC to navigate between network(OSI) layers and carry out impact analysis.
The IETF NMOP team are proposing a common telemetry solution for Assurance Management and reporting solutions for several IP networking solutions, where the difference between the solutions are the specific resource model and mapping to standardized Service Models.
This Use Case reflects an evolution in the management of IP Networks. Historically IP Network Management has operated at the element management and device level as shown on the left-hand in the diagram below.
Recent IP Management Solutions are based on Network Management realized as controllers as shown on the right-hand side below:
Ed Note consider putting this following material in an appendix.
Figure 2.2 Evolution of IP Management
The evolution of IP Management has led to the introduction of Service based Intent interfaces as proposed in the NaaS concept of autonomous Technology Domain Managers.
Current vendor implementation of IP Networking are adopting Information models based on IETF models and standards. The main elements of these models and standards are summarized below.
Figure 2.3 IETF Network Controller models and standards for Technology Domain Managers
Note It is probable that both service and resource/network level models need to be exposed by Controllers in addition to Intent interfaces
This evolution mirrors the ITU evolution of Telecommunication Network Network (TMN) standards from Element Management supporting FCAPS functionality in M.3010, to the Service based models in M.3041 and M.3080.
It also mirrors ITU-T proposal in draft Y.3061 Architecture framework for Autonomous Networks with multiple cooperating controllers/orchestrators:
Figure 2.4 Draft Y.3061 Architecture Framework for AN and roles of Controllers
2.2. IG1373 Service based model
The implied model in IG1373B is complex as it:
- Covers multiple OSI layers.
- Uses multiple technologies.
- Uses multiple operational domains.
- Spans across physical infrastructure and fiber, and logical networks operating at OSI levels 2 though 4.
To illustrate the different aspects of this complex model, this document has refactored the IG1373 model into an SD-WAN OSI Layered Service Model:
Figure 2.4 Refactored IG1373 SD-WAN OSI Layered Service Model:
Ed Note this diagram is derived from a single complex model but with each viewpoint being captured in a drawing 'layer' and only exposing the viewpoints needed for each section of this document.
The ODA Production team explored placing Autonomous Domains around these entities and there were a few insights:
- This is a complex task, and the domains are hierarchical across level 3, 2.5, 2 ,1 and zero; whereas originally it was thought that the Cross Domains Management would be horizontal concatenated Connectivity Domains at the same OSI level,
- When networked domains are self managing and self healing, then a break in fiber link does not always lead to a hard fault that that must be repaired immediately to restore service.
In many cases the network health is impaired and less resilient whilst this resource issue is left unresolved.
However to maintain network availability and resilience it is necessary that the resource repair is carried out so the interaction with the supervising system might be a request to replace or repair a resource.
For physical failures this requires personnel to be despatched to carry our repairst. Initial PoC evidence shows that such approaches can lead to 40% fewer tickets being processed. - Whilst the services are provisioned at the service /intent level the reporting may be at the resource level. Consequently, the mapping of services to resources implies this information is exposed by the resource inventory functions held within these self-managing domains.
This viewpoint shows: the physical equipment, the OSI layers that are supported and the proposed Autonomous Domains to support the use case.
It also proposes the service layers whose management information models for provisioning and assurance are documented in the following sections.
3. Information View
This section describes the Information models that interact in this use case:
- The SD-WAN Service (Overlay)
- The IP Transport Service Underlay (IETF ACTN based)
- The Fiber Transport Service
- The Infrastructure Service
Ed Note the service-based model used for provisioning may need extension to include a resource level model for notifications of impaired resources within self-healing domains.
Whilst the models that follow cover the logical models for the provision of the services, the assurance processes simply need to use these models to identify and label degradation and faults in services and resources . Hence the assurance models n Section 3.3 are the principle focus for interactions among network and ODA Components.
3.1. Modelling
TR255A Connectivity Patterns for Virtualization Management v4.0.1 documents a model for representing network and connections. This model separates the static part of connectivity from the dynamic part on a per layer basis - For example what is regarded as a static network connection at level 3 might actually viewed as dynamic flow from a level 2 viewpoint.
The core concepts are shown below:
Figure 3.1 Connectivity service model TR255 GB999 ODA Production Implemtnation Guidelines
This concept of connections/trails and flows has a long history in ITU G.905 and G.908 (connectionless) with concepts such as: link connection, trail, sub-network and Connection points. These evolved and form the basis of the ONF TR512.4 Common Information Model which was adopted across several SDOs including: TM Forum (incorporated in GB922), MEF, 3GPP and ETSI. There is also a concept of a link to concatenate between Connectivity Service Domain Termination Points at the same OSI level.
3.2. SD-WAN (Overlay) Service Information Model (MEF 70.1): Provisioning
For this exemplar the SD-WAN Service model used is the MEF 70.1 Service Model and Attributes at MEF 3.0 SD-WAN Service Standards
Figure 3.1.1 SD-WAN Service Model (derived from MEF 70.1)
Key for MEF SDWAN Service Model MEF 70.1
The scope of this MEF SD-WAN model and concepts above is shown in the refactored IG1373 SDWAN Use Case Layered Model:
Figure 3.1.2 SD-WAN service model mapped to refactored IG1373 SD-WAN OSI Layered Service Model
Ed note: This diagram is a specific view of the Gliffy master diagram in the appendix section 6
3.2.1. Mapping MEF SDWAN model entities to TM Forum SID/ ONF
Figure 3.1.3 Information model MEF SDWAN Service
This diagram provides a UML lclass representation of the MEF SDWAN Service Model.
3.2.2. Mapping table for SDWAN based on TR255A
The following table maps the SDWAN classes to the equivalent classes in TR255A and Information Framework GB922.
The main benefit is the addition of SDWAN service attributes/ characteristics to TR255A/ Information Framework (GB922).
This is valuable when creating JSON Schema extensions to be used with Open APIs when using the polymorphic extension pattern.
These entities and atrributes provide the structure for assurance reports defining impaired services and resources.
3.3. SD-WAN (Overlay) Service Information Model: Assurance
Ed Note In this release we identify the sources of assurance information models. Further work is needed to create definitive impairment type specifications. The expectation is that further studies will result in both formal ontologies and JSON Schemas for health and network impairments
As SDWAN Service is self-healing then a proactive Assurance approach is needed.
The Assurance models are based on both the Service and their related Resource Models. The Service-Resource relationships are managed within the service thus allowing changes to be made by self-healing functions, and may change frequently e.g. virtualization. Hence these relationship cannot be held in traditional OSS inventories outside the service.
- The Service model reports status changes which may be hard failures in some functions ( e.g. SDWAN UNI, SDWAN EDGE ) and heath impairments e.g. Tunnel Virtual Connection arising from faulty or impaired resources forming the Underlay Connectivity Service (UCS).
- Resource models are vendor specific i.e. the resource model is not standardized (but may work to common resources specifications). State changes or impairments in these models are reported together with the relationship to the impacted service model entities. Accurate timestamping of events and state changes of service and resource impairments in reports is critical for temporral correlation of changes.
IETF work for Intent based network assurance is based on a number of recommendations (RFC):
- RFC 9417: Service Assurance for Intent-Based Networking Architecture (SAIN)
Services rely upon multiple subservices provided by a variety of elements, including the underlying network devices and functions, getting the assurance of a healthy service is only possible with a holistic view of all involved elements.
This architecture not only helps to correlate the service degradation with symptoms of a specific network component but, it also lists the services impacted by the failure or degradation of a specific network component - RFC 8345 - A YANG Data Model for Network Topologies
Abstract data model to represent networks and topologies. The data model is divided into two parts:
The first part of the data model defines a network data model that enables the definition of network hierarchies, or network stacks (i.e., networks that are layered on top of each other) and maintenance of an inventory of nodes contained in a network.
The second part of the data model augments the basic network data model with information to describe topology information. -
SIMAP Service & Infrastructure Maps: draft-ietf-nmop-simap-concept-03 - SIMAP: Concept, Requirements, and Use Cases
Data model that provides a view of the operator's networks and services, including how it is connected to other models/data (e.g., inventory, observability sources, and operational knowledge).
It specifically provides an approach to model multi-layered topology and an appropriate mechanism to navigate amongst layers and correlate between them.
This includes layers from physical topology to service topology.
This model is applicable to multiple domains (access, core, data center, etc.) and technologies (Optical, IP, etc.).
The SIMAP modelling defines the core topological entities (network, node, link, and termination point) at each layer, their role in the network topology, core topological properties, and topological relationships both inside each layer and between the layers. E
Example of the use of this model, which is based on ONF512.4 Common Information Model, is incorporated in the TM Forum Information Framework contributed by TR255. Examples of IETF usage of these models are in:- Reactive Maintenance/Assurance: idraft-ietf-nmop-network-incident-yang-03 - A YANG Data Model for Network Incident Management
- Proactive Maintenance/Assurance: draft-ietf-nmop-network-anomaly-architecture-02 - A Framework for a Network Anomaly Detection Architecture
3.3.1. General Observability Model
Whilst provisioning may be largely done at a network service level, assurance processes also need to be able to observe the resources that are realizing these services.
Shown below is an example of the distinction between a Service and a Resource. A CFS may declaratively describe an SD-WAN Flow while a Resource Function may describe a Flow as realized in the network:
Fig 3.3.1 TMF664 request is declarative while the response/result may include the end-to-end topology as deployed.
For a self-healing domain the resources and the topology underpinning a service may change arising from decisions made by controllers or heath changes when incidents/ Faults occur. At least two forms of observability are needed:
- Responses to Resource Inventory queries that may produce dynamic results reflecting up to date information, in some cases in the form of topology information (TMF 664 Resource Activation and configuration and TMF686 Topology API)
- State changes that may trigger a Resource Inventory (TMF 639) notifications when a resources is impaired/ faulty, or where service heath is impacted Service Inventory (TMF638) state changes update and subsequent notifications.
- Service Problem Management API(TMF656) has some features for reporting impaired services and resources.
- It is possible that these notifications need to use the TMF688 Event Management API so that events can be streamed by topic to relevant consumers.
Use of TMF 688 would appear to fit with the thinking about democratizing of Data in the Modern Data Architecture team.
3.3.2. Reactive assurance models
The IETF reactive model describes a set of functions that are provided to give observability of Network Resource. For this use case the main addition will be to add to these resource reports the list of impacted services and timestamps.
3.3.2.1. Incident Identification service VPN Degradation Example
FIG 3.3.2 Example of Network incident Identification IETF NMOP Network IncidentYANG-03
In this use case the Service Management Intelligent Controller incorporates the Orchestrator functionality and Observability API need to support Service and Resource health impairments. The example here being a VPN Degradation arising from either or both Packet Loss or Path Delay.
3.3.3. Alarm Incident Management Service Interworking Reactive
Fig 3.3.3 Interworking with Alarm Management IETF NMOP Network IncidentYANG-03
The SM Intelligent Controller can perform either or both the e2e OSS functions and the operational domain controller roles above. The APIs required for the controller implied from the diagram above include:
- Observability including Metrics, traces /logs.
- Fault/Alarm and incident reporting from supporting controllers.
These interactions are also being explored in the AN Team for use with AI, nd reported in IG1343 Using AI to Enable Network Fault Detection, Resolution and Configuration v1.0.0 DRAFT
3.3.4. Proactive assurance models
IETF are developing a proactive assurance model in draft-ietf-nmop-network-anomaly-architecture-02 - A Framework for a Network Anomaly Detection Architecture.
Functionally equivalent work has been specified by the TM Forum AI-Closed Loop Automation team in AI Closed Loop Automation – Anomaly Detection and Resolution v2.1.0 (IG1219) – TM Forum
3.3.4.1. IETF Predictive assurance model
The IETF draft-ietf-nmop-network-anomaly-architecture-02 - A Framework for a Network Anomaly Detection Architecture describes a set of co-operating function that together can predict anomalies and incidents
Figure 3.3.4 IETF NMP Framework for a Network Anomaly Detection Architecture
A few observations:
- The results of these architecture functions are streaming messages that report network anomalies/incidents..
- This approach is different from traditional fault managers, as it is based on collecting data via telemetry sources, and then reasoning about anomalies and outliers.
- These functions form the part of a control loop covering: awareness, analysis and decision-making functions. akak OODA
- These functions are controlled and managed by a Technology Domain Controller that adds closed loop management, and orchestration functions that direct components that execute decisions.
Separating the control loop functions from their management /orchestration in a controller permits the controller to control multiple specialist anomally solutions each optimized for a specific technology.
3.3.4.2. TMF Anomaly /Assurance predictive model
Three components were proposed for implementing part of the assurance analysis, prediction and mitigation decisions.
- TR309 Anomaly Predictor ODA Component Requirements v1.1.0 – TM Forum
- TR309A Anomaly Predictor Business Process Scenarios v1.0.0 – TM Forum
- TR310 Anomaly Mitigator ODA Component Requirements v1.1.0 – TM Forum
Subsequentially these were decided to be features of as single Anomaly Management Component TMFC041
The architectural model for Anomaly management is described in TR 284A
Figure 3.3.5 Anomaly management Closed Loop functions TR284A
This architectural framework identifies the functions that are needed for Anomaly Management Closed Loops.
The stages used are based on the Observe Orient Decide Act model which is functionally similar to the awareness, analysis, decision and execution model used in the Autonomous Network Functional Architecture.
The architectural model for event processing is documented in TR284D.
Figure 3.3.6 Framework of Anomaly Event processing TR284
Anomaly Detection: The group of tasks that monitor quality/state of network/services, such as collecting network/services information, preprocess information and identifies exception data, to support awareness for assurance etc.
Anomaly Event Assessment: Anomaly Event Assessment provides analysis for data collected by awareness or anomaly detection phase, it consists of four submodules, service impact analysis, Anomaly event identification, demarcation, and locating.
Anomaly Event Mitigation: Anomaly event mitigation process matches, evaluates, determines, and executes the anomaly event mitigation solution, and verifies and reports the service recovery status. It covers both decision and execution.
Anomaly Event Learning Management: Knowledge recycle is responsible for extracting and applying knowledge process extract anomaly event handling knowledge using technologies,
such as knowledge graph and apply the knowledge to iEM system. The knowledge includes anomaly event identification rules, diagnosis and location logic, resolution matching rules, and service verification policies.
This process continuously enriches and enhances the automatic close-loop ability of anomaly events of the EM system.
3.3.5. Information Models for Assurance
Development of precise assurance models for the service used in this use case will be added in a later release.
In this release we identify the sources of models that need enhancement or modification.
The working assumption is that the model needs to cover Incident and Fault Management Types and for these to be associated with impacted services and resources references serivces asn repurces defined in the provisioning models identified in this document.
There need to be explicit types developed for each service to support both temporal and spatial correlation.
Current APIs tend to be incomplete, or generic, often using string types rather than being strongly typed which is needed for effective correlation .
3.3.5.1. TM Forum Alarm Management TM642
The Alarm Management API support the current alarm types identified are:
wit:h severity
The current data model for Alarm is
This does have a set of attributes that could be enhanced:
- AlarmRaisedTime But the semantics may need clarification that 'time' is of the event occuring in the network not its registration in the OSS/Controller.
- AlarmedObjectType need to be based on provisoning models
- ProposedRepairedAction need to relate repar to reource models, some of which my be vendor specific.
However the data model neeed to support multiple fault type and and multiple incident types in a single report. Proof of Concept trials are showing that volumes of faults and incident is a major implematnation concern.
3.3.5.2. TMF Forum Incident Management TMF742
There is a data model for incidents in TMF742.
This does have
- Incident Detail (string) which probably needs extension to cover health types enumerated for each service.
- Occur time: Semantic may need tightening up.
- Resource entity may need to add Service, or use Resource Functions that can be used in both Service and Resource Models
Data models for these APIs have some extensions based on ITU-T X.733 and 3GPP TS 32.111-2 Annex B and could be the best place for adding extension types.
Ed Note The current Service Management Intelligent controller (SMIC) Compoent specfication may need to add this API for observablity purposes.
3.3.5.3. TM Forum Service Problem Management TMF656
The Service Problem Management API TMF656 uses a Servcie Problem Schema at
this identifies a number of problem type but is current weakly typed as String types.
3.3.5.4. IETF Incident Management Models
IETF do have some YANG models for Incident in draft-ietf-nmop-network-incident-yang-03 - A YANG Data Model for Network Incident Management.
3.4. IP Transport (Underlay) Service Information Model (IETF ACTN) Provisioning
There are a couple of possible IP Transport models that are based on the IETF:
RFC 8453 - Framework for Abstraction and Control of TE Networks (ACTN) and Associated
RFC 8454: Information Model for Abstraction and Control of TE Networks (ACTN)
which have extension for L3VPN in RFC 8299: YANG Data Model for L3VPN Service Delivery
The other candidate model for IP Transport Service is from MEF in
MEF 69.1 - Subscriber IP Service Definitions
MEF 61.1 IP Service Attributes - MEF
which has a mapping to
RFC 8299: YANG Data Model for L3VPN Service Delivery
3.4.1. MEF IP Service
MEF has the following high-level model for (Subscriber) IP Service
Figure 3.4.1 MEF IP Service Model MEF 69.1 MEF 61.1
3.4.2. Mapping table for IP Service based on TR255A
The high-level Information Model for MEF IP Services is
Figure 3.4.2 Information model MEF IP Service
This diagram provides a UML class representation of the MEF IP Service Model.
The following table maps the IP Service classes to the equivalent classes in TR255A and Information Framework GB922.
The main benefit is the addition of IP Service attributes/ characteristics to TR255A/ Information Framework (GB922).
This is valuable when creating JSON Schema extensions to be used with Open APIs when using the polymorphic extension pattern.
3.5. Fibre Transport and Infrastructure Service Information Models (TMF Information Framework)
Fiber access and transport are specified by IEEE and ITU-T. The following tabel shows the initial work of the access and transport infrastructure.
Figure 3.4.5.1 Transport model sources
The following diagrams show exemplar Physical Infrastructure Models for Access and Core Transport network used to realize fibre based networks.
Figure 3.5.2 Passive Infrastructure Access network
Ed Note: UML models to be added.along with description of entities This will be based on an mTOP Contribution ( in this working document comments).
Figure 3.5.3 Exemplar passive Infrastructure - Core transport
Ed Note: UML models to be added.along with description of entities This will be based on an mTOP Contribution ( in this working document comments).
There is a possibility of using draft-ietf-ivy-network-inventory-topology-01 as basis of a formal Information Model in the SID. An example of the current IETF model is shown below:
Figure 3.5.4 Draft IETF Network Inventory model in YANG
It is possible to translate the IETF Yang model into equivalent Json schema for use in a Network Inventory such as TMFC012 Resource Inventory.
For further Study.
4. Sequence diagrams
The following simplistic scenArio shows as Backhoe severing a Fibre and the consequential activities and actions.
Asynchronously several component start reporting service and resource health impairment and specific Incident/Faults in physical resource
The assumptions are that:
- All reports are time-stamped for temporal correlation.
- The Service Management Intelligent controller has the responsibility to:
- Evaluate and diagnose the incoming reports, and determine actions including,
- Request relationship and topology information from service and resource inventories including those within the SDWAN and IP Service Controllers,
- Recommend through work order actions to repair impairments and restore the network Health.
Fig 4.1 Exemplar network heath restoration sequence arising from fibER break
Note in this networking example, the networking components are vendor supplied and need to interoperate with ODA Components. In some cases using interfaces defined by other organizations e.g. MEF models enhancing TM Forum Open APIs, and IETF protocols and Yang based models.
In this example a fiber break may be reported by an external party but in practice it will be preceded by numerous reports about network impairments (Service and Resource level) that are generated by multiple parts of the network and in this exemplar received by the Service Management intelligent Controller (SMIC).
It then analyses these reports, gathers supporting information, including that from Fiber and Line Plant Invenentories and makes recommendations on actions to restore netwotk Health.
These diagnostic and decision process are not simple but this ODA Production framework establishes the environment in which such an intelligent controller can operate, and facitiate the development of improved AI algorithms to reduce the burden on Operations staff managing Assurance and Network Heath.
This Ai enabled assurance addresses tthe two operational challenge identified in the objectives: awareness of the network health and guidance on what network repairs are needed to restore network Health
Ther are two areas where alternative interaction sequences might be possible:
- In this example, Reporting network Health is assumed to a Service Problem Management Component but could have been to a SLA Management Component if it was specified.
- requests for Work orders need to be validated by Operations staff. However this might be submitted through a separate system such as a trouble ticket Component rather than the SMIC proposed here.
This approach has the benefit that it allow for automated invocation of repairs once confidence in AI decison making has been established by Operations people.
5. Conclusions
5.1. Lessons learned
Cross Domain Health & Probable Cause Analysis Fibre Fault/Break is a complex topic as it involves multiple interacting networking technologies, multiple OSI levels, and the need to link Information and inventory for both Physical Networks and for logical network function
Assurance processes are a high value use case if they support operations people address two questions:
- How do Operations staff determine that there is an outage/ impairment to the Network Health?
I.e. what kind of observability and metrics need to be available to Operations staff - How do they determine who, what, where & when to repair the fault?
I.e. what are the operational procedures and mechanisms for using this information?
These being more challenging when networks are self-managing and healing using proactive and predicative mechanisms.
What is needed are:
- Management solutions that integrate with network equipment supplier current Controller based solutions. The component boundaries of management functions for deployment have to match network controller which means component boundaries are not based on purely functional or information model boundaries.
- A common observability model across multiple Network technologies and multiple OSI levels.
- Interfaces that allow for flexible exchange of information between controllers operating at multiple OSI levels,
- Interfaces that provide time-stamped events for state changes in network and service impairments.
- The draft spefication for the Servcie Mamagment Intelligent controller here needs an uplift to support observability reuirements including addition of of dependent APIs
5.2. Impacts identified
There is a need to extend both Information Framework and API data models to support concrete network technologies such as those considered in this use case e.g SD-WAN, IP Transport, Fibre and Physical Infrastructure.
These models are fundamental to correlating events temporally and spatially across multiple technologies and OSI levels.
Given the complexity of networks and the skills required, the lead for the development and validation of these models needs to come from members with in depth networking knowledge, as this is not commonly present within the skill sets of API developers or information modelers.
Additional analysis, enhanced coordination and alignment i is needed with the proposals in IG1343 Using AI to Enable Network Fault Detection, Resolution and Configuration v1.0.0 DRAFT
6. Appendix
6.1. Use Case Autonomous Domain Layering
The box infrastructure is described in IG1G1373
6.1.1. Canonical model from which previous models are derived
updated
Figure A 6.1 Refactored layers model derived form IG1373 SDWan Use Case
This diagram is used to derive all the other diagrams in this report by hiding layers that are not relevant to the dicussion.
6.2. Terminology
The IETF has recently produced a recommended set of term for Network Management in:
ietf.org/archive/id/draft-ietf-nmop-terminology-16.txt
for the purposes of this use case we use these terms:
IETF Term | IETF Definition | Interpretation |
---|---|---|
Problem |
A State regarded as undesirable and that may require remedial action. A Problem cannot necessarily be associated with a Cause. The resolution of a Problem does not necessarily act on the thing that has the Problem. * Note that there is a historic aspect to the concept of a Problem. The current State may be operational, but there could have been a Fault that is unexplained, and the fact of that unexplained recent Fault is a Problem. * Note that while a Problem is unresolved it may continue to require attention. A record of resolved Problems may be maintained in a log. * Note that there may be a State which is considered to be a Problem from several perspectives. For example, consider a "loss of light" State that may cause multiple services to fail. In this example, a new State (the light recovers) may cause the Problem to be resolved from one perspective (the services are operational once more), but may leave the Problem as unresolved (because the loss of light has not been explained). Further, in this example, there could be another development (the reason for the temporary loss of light is traced to an microbend in the fiber that is repaired) resulting in that unresolved Problem now being resolved. But, in this example, this still leaves a further Problem unresolved (a microbend occurred, and that Problem is not resolved until it is understood how it occurred and a remedy is put in place to prevent recurrence). |
|
(Resource) State: |
A particular Condition that a Resource has (i.e., it is in a State) at a specific time. It may be helpful to qualify this as "Resource State" to make clear the distinction between this and other uses of "state" such as "protocol state". |
For assurance and observability it is necessary to send Resource State changes with timestamps using telemetry / intent reporting |
Incident: |
A (Network) Incident is an undesired Occurrence such as an unexpected interruption of a network service, degradation of the quality of a network service, or the below-target performance of a network service. An Incident results from one or more Problems, and a Problem may give rise to or contribute to one or more Incidents. Greater discussion of Network Incident relationships, including Customer Incidents and Incident management, can be found in [I-D.ietf-nmop-network-incident-yang]. |
Use this term to describe a network Impairment i.e reduction in Heath affecting integrtity resilience of the network such as a Line Card fault or physical connection failure. This abstracts and encapasulates other concepts whcih may be used internal to a self healing domain sees IETF Terminology This scenario assumes incidents are timestamped |
7. Administrative Appendix
<This Appendix provides additional background material about the TM Forum and this document. In general, sections may be included or omitted as desired; however, a Document History must always be included.>
7.1. Document History
7.1.1. Version History
This section records the changes between this and the previous document version as it is edited by the team concerned. Note: this is an incremental number which does not have to match the release number and used for change control purposes only.
Version Number1 |
Date Modified |
Modified by: |
Description of changes |
---|---|---|---|
v0.0.1 |
|
Document Creation | |
v0.0.2 |
|
Editors draft | |
v0.0.3 |
|
editor's Team approved draft | |
v1.0.0 |
|
Final adminstrative edits prior to publication |
1 examples: v1.0.1 for minor changes, v1.1.0 for major change with compatibility with the previous version, v2.0.0 for major change without entire compatibility with the previous version. Refer to TMF official document describing version management rules
7.1.2. Release History
Release Status |
Date Modified |
Modified by: |
Description of changes |
---|---|---|---|
Pre-production |
|
Rosie Wilson | Initial Publication |
7.2. Acknowledgments
This document was prepared by the members of the TM Forum End-to-End ODA team.
Team Member (@mention) |
Company |
Role* |
Brad Peters |
NBN Co |
Subject Matter Expert contributor/ reviewer |
Vance Shipley |
SigScale |
Subject Matter Expert contributor |
Dave Milham |
TM Forum |
Curating Editor |
Dmytro Gassanov |
TM Forum |
Network SME |
Rephael Benhamo |
Barnet Communications |
Additional Input |
PeterSkoularikos |
Telekinetics |
Additional Input/ reviewer |
<*Select from: Project Chair, Project Co-Chair, Author, Editor, Key Contributor, Additional Input, Reviewer>
10 Comments
Rosie Wilson
Hi Dave Milham,
In order to meet publication requirements, please ensure:
Thanks,
Rosie
Kevin McDonnell
Hi Dave,
A key outcome of this guide should be clarifying the proposed ODA Components and roles, especially the SMIC/"Service IMF" and how they connect to ODA components and AN domains/agents. Some more thoughts below that i tried to explain on the call...
Overall, we should get an alpha approved version without too much resistance...but it does need bigger team effort in bext version to get it the the level of the Wholesale Broadband one (TMFS018) .. Also this guide and IG1343 could follow a "twin-track alignment" into the next sprint ...given the common set of authors etc and shared topic.... Hope this Helps, Kevin
Richard Kilmurray Brad Peters Dmytro Gassanov Jörg Niemoller
Dave Milham
Appreciate the feedback and comments.
Bit of backgrond before responding to the comments linearly
TMFS000: Use Case: ODA Use Case Template v1.0.5 - End to end ODA - TM Forum Confluence
as applied to the scope of the use cases which is assurance across multiple technologies and multiple OSI layer 0 thru 3.
At the moment the information models are not fully complete other than for SDWAN. Focus is on proactive maintenance.
for complete analysis of the rationale for ODA Production Components
So we are a bit stuck trying to socialize amongst multiple teams and gain a cross team consensus
Note sequence chart been updated to more clearly identify ODA component and'network' components supplied by NEP vendors
Next post address comment 1 by one .
Richard Kilmurray Brad Peters Dmytro Gassanov Jörg Niemöller
Dave Milham
In line Comment s to Post
A key outcome of this guide should be clarifying the proposed ODA Components and roles, especially the SMIC/"Service IMF" and how they connect to ODA components and AN domains/agents. Some more thoughts below that I tried to explain on the call...
<TR313C covers the analysis of required ODA Production Component to integrate with evolving NEP offers, both traditional reactive ( based on TMN M.3010 concepts ) and proactive (based on AN, IETF and M.3041). Granulaity of Components is definitely a topic for discusion. I opted for coarse grain based on two considerations:
(1) If we break up the functionality of the SMIC into its feature groups one ends up with more component around 4 or 5 . (2)Then to define them as ODA Components you need to define the APIs exposed and dependent between these 5 components. There is quite high coupling between Intent management ,Orchestration, Closed loop management and Agents / Domain Knowlegement. And this in an area of rapid evolution especially around agents and there is a risk of picking integration points that inhibit innovatio . Notably there is as study in the 'ODA in a Box' catalyst about more coarse grain specification of systems/compoents. So granularity needs a fair degree of discussion. Also, what we are proposing isn't that far from the ITU T Y.3060 proposals so we need to not fall behind the curve. IMHO better to define a coarse grain component and then refactor based on practical implementation experience.
Very happy to update TR313C to address comments E&OE>>
<<Agree happy to re-read IG 1343 and make proposals. This is very much an alpha draft addressing two specific exam questions. >>
<< can we get some comments on this section so we can figure out what to fix ? Do agree x-D observability aspects need more technical work but was slightly incidental in this Use Case which just assumes you can get notification and events of particular types . Needs checking with APIs and possibly proposing polymorphic schema extension to existing APIs /Components. >>
<< Absolutely correct. Current ODA component table is incomplete hence the review of prior activities in previous post to get proposals from TR313C and ODA Production team into IG1242 which drives the published table at TM Forum - ODA Component Directory >>
Overall, we should get an alpha approved version without too much resistance...but it does need bigger team effort in bext version to get it the the level of the Wholesale Broadband one (TMFS018) .. Also this guide and IG1343 could follow a "twin-track alignment" into the next sprint ...given the common set of authors etc and shared topic.... Hope this Helps, Kevin
<<Happy to support this>>
Richard Kilmurray Brad Peters Dmytro Gassanov Jörg Niemöller
Rephael Benhamo
Dave, Brad.
I have added a list of standards in section 3.5
Dave Milham
Rephael thank You
Dave Milham
Brad Peters has suggested a simplifed model to IETF for relates functionalaity
Once this is accepted by IETF NMOP we should incorporate into this document as simpler than current models.
Dave Milham
Fibre and Physical Line Plant UML models
Base document for developing the Fiber and Physical line plant UML models is in a MTOP contribution(2017). which might need to be added to the SID. It is documented in the MTOP series of documents. Note it also covers Radio base station models.
Rosie Wilson
Publish as Alpha version in Sprint 2 with further updates planned for Sprint 3.
Team approval confirmed
Dave Milham
Aded some next steps learning points based on team discussion on 15th May 2025 as don't whant to miss them in the next update.
Add Comment