Zong Xumei, Deputy General Manager of Network Dept., China Mobile Jiangsu Branch
Ye Wen, Manager of Core Network Office, China Mobile Jiangsu Branch
It has become an urgent need for Jiangsu Mobile to maximize the use of big data and AI technologies to evolve to intelligent O&M and build a zero-interruption network, in order to reduce O&M costs, improve efficiency, ensure better network quality, and improve user digital service experience.
The introduction of 5G, IoT, and industrial digitalization brings about complex network changes. In addition, multiple generations of technologies coexist, which makes it difficult to quickly demarcate and locate network faults. Massive amounts of digital services and events generate explosive traffic growth and pose great challenges to ensure secure and stable network operations.
After the great evolution of the steam era, electrical era, and information era, we are now ushering in new historical turning points. ICT network-based digital technologies, such as Big Data, cloud computing, and artificial intelligence, are regarded as the fourth technological evolution, that is, an era of intelligence. In this new era, technology evolution is being expedited along with industry transformation. The physical and digital economy is merging on a fundamental level, and these changes will greatly impact the everyday life in our society. To serve customers in this new era, China Mobile's standpoint is that network quality is the lifeline of telecommunication companies, and as such is actively building next-generation intelligent networks with AI as the catalyst for change in network O&M transformation. However, as networks and services become more complex, the maintenance difficulty also increases whereas the maintenance efficiency needs to be improved. This means that traditional problem-specific maintenance models cannot adapt to these network developments. The transformation of network O&M to intelligent maintenance based on digital technologies has become a global trend in the telecoms industry and it spans multiple adjacent industries to become one of the most competitive markets in the world.
It is predicted that by 2025, the global connectivity index will reach 100 billion and communication networks will be used as information superhighways, therefore the scope of the impact from network faults will be expanded. One reason for this is that the development of 5G, IoT, and industrial digitalization has had a major impact on the complexity of networking. Over a period of time, generations of different technologies coexist on live networks, which makes it difficult to quickly demarcate and locate network faults. Massive amounts of digital services and activities at key events have caused explosive growth in traffic and pose great challenges for a secure and stable network operation. How to provide better network quality, better network competitiveness, and better digital service experience with high efficiency, all at lower O&M costs? This is the challenge facing Jiangsu Mobile in this new era.
In the traditional passive emergency maintenance Run-to-Failure (R2F) mode, maintenance engineers are struggling to stay on top of frequent network faults. Furthermore, the duration of fault recovery varies significantly. Preventive Maintenance (PvM) or routine inspection and maintenance, can prevent faults in advance, but the efficiency is low. Faults of most network devices do not appear out of the blue, but develop over time, which leaves traces before the device finally becomes faulty. Engineers can detect a fault before it occurs based on changes in physical status or working parameters. As the engine for O&M evolution, digital technology enabled Predictive Maintenance (PdM), allows engineers to predict the possibility of a device fault, and then perform specific maintenance. This means faults can be prevented before user services are affected, improving the efficiency for planned routine maintenance. Jiangsu Mobile expands the mining of big data and artificial intelligence, to evolve towards intelligent maintenance, thus aiming to build a zero-interruption and robust network.
China Mobile Jiangsu divides intelligent O&M evolution into five phases. In the first phase, AI is used to point out what particular fault happens; in the second phase, it tells you why it happens; in the third phase, it can predict what will happen, providing basis for engineers to take corresponding countermeasures; in the fourth phase, the AI becomes self-reliant, determining the measures to be taken, though engineers still need to perform the operations; and in the final phase, AI will realize total self-control and automatic network repair, enabling self-healing of the network.
Predictability is the new value brought on by introducing AI to telecom networks. Intelligent O&M evolution is a long-term process and cannot be achieved overnight. As the eminent computer scientist Alan Kay once said, “The best way to predict the future is to invent it”. Adopting the model of developing high-level strategies, low-level methods, and quick actions, Jiangsu Mobile worked with Huawei's robust network project team to actively explore and successfully introduce intelligent O&M practices. Jiangsu Mobile uses big data analysis and AI algorithms as technical means to mine massive volumes of data and O&M experience during network operations, and builds intelligent O&M capabilities that cover prevention, diagnosis, protection, and evaluation. This model is key for improving the maintenance efficiency and reducing network faults, helping to safeguard live networks.
Using the industry's latest intelligent maintenance technologies, Jiangsu Mobile built four defense lines to guard the robust network.
The network automatically collects historical indicator data from the live network, analyzes service indicators and error code data, and extracts data fluctuation periodically, YOY/MOM comparisons, statistics and distribution. Through different algorithms, fault prediction models are developed for different service types. Then, data from the live network is compared to and analyzed based on the risk prediction model in real time, helping to accurately identify the fault in advance. It has been field proven that the intelligent risk prediction can identify network faults several hours in advance, providing a solution for the maintenance pain points of VoLTE services where traditionally faults can only be identified through alarms and user complaints.
The network automatically collects traffic statistics, alarms, and operation logs through information aggregation and drilling, and provides fault aggregation based on Call History Record (CHR) information. The network then performs online convergence and analysis on the large number of CHRs and alarms/IP data generated by the faults, quickly locating the problem distribution across nine dimensions, such as number, terminal, and cell, and locate the faulty Network Element (NE) by aggregating alarms among different NEs. This approach greatly improves the efficiency of analyzing massive alarms and log data. In addition, the maintenance experts experience is digitalized. Huawei's global VoLTE maintenance experience and 10, 000+ NE internal error code processing suggestions are converted into judgment logic and rules that can be executed by a maintenance IT platform. Using the open-source service rule engine Drools, the fault analysis is transferred from engineers to machine intelligence. At the same time, the rule and software code are decoupled in the project practices in order to implement fast iterative updates and maintenance.
To ensure successful cutover, a traditional cutover project team usually formulates detailed implementation solutions and assurance plans, yet cutover accidents still occur more often than not. In the practices of intelligent network cutover assurance, innovation mainly lies in three phases: operation, verification, and on-duty support. In the operation phase, intelligent E2E risk detection is used to implement different monitoring policies based on whether operations are performed on NEs and whether services are affected. This helps identify errors during the operation and alert operators to correct the errors before services are impacted. In the verification phase, the network implements fast and in-depth service verification based on NE, scenario, and expert experience indicator system, and through automatic analysis on alarms, logs, and dialing test/CHR. During on-duty support phase, the network uses intelligent auxiliary tools to monitor user complaints in real time, quickly identify and report operation associated risks, and quickly mitigate risks using expert experience. Digital technologies solve the pain points in traditional cutover processes, such as the lack of in-process adjustment, insufficient verification, and passive attendance.
Even for the unpredictable network risks, there will always be evidence before the faults occur. Traditional network inspection methods, however, are inefficient and require high personnel skills. With intelligent analysis methods, the project develops five evaluation rule dimensions, namely basic evaluation, high frequency online evaluation, special evaluation, trend evaluation, and user-defined evaluation. These are to check the rationality of static device configurations, check the real-time running status of device software and hardware, perform in-depth log check on system running status, check the trend of hardware and software resources, and check maintenance-defined rules, therefore comprehensively evaluating the robustness of the equipment and monitor the network risks. When routine maintenance experience is consolidated into rules, online real-time data collecting and intelligent identification and analysis can improve the accuracy of network risk evaluation by 90% or higher.
Over the next three years, Jiangsu Mobile and Huawei will extend their cooperation to more domains of intelligent O&M. To greet the new era of intelligent 5G network, China Mobile Group is also actively building the next-generation intelligent networks. In the future, AI technologies will be interwoven into the wireless network, core network, and transmission network. Intelligent networks and intelligent O&M will become digital dual engines, helping China Mobile and other telecommunication enterprises transform towards intelligent operations and become the key competitive differentiator in the future digital ecosystem.