You are on page 1of 78

SURPASS hiE 9200 V3.

2




Recovery




2
Recovery
Id:0900d8058008c996
The information in this document is subject to change without notice and describes only the
product defined in the introduction of this documentation. This documentation is intended for the
use of Nokia Siemens Networks customers only for the purposes of the agreement under which
the document is submitted, and no part of it may be used, reproduced, modified or transmitted
in any form or means without the prior written permission of Nokia Siemens Networks. The
documentation has been prepared to be used by professional and properly trained personnel,
and the customer assumes full responsibility when using it. Nokia Siemens Networks welcomes
customer comments as part of the process of continuous development and improvement of the
documentation.
The information or statements given in this documentation concerning the suitability, capacity,
or performance of the mentioned hardware or software products are given "as is" and all liability
arising in connection with such hardware or software products shall be defined conclusively and
finally in a separate agreement between Nokia Siemens Networks and the customer. However,
Nokia Siemens Networks has made all reasonable efforts to ensure that the instructions
contained in the document are adequate and free of material errors and omissions. Nokia
Siemens Networks will, if deemed necessary by Nokia Siemens Networks, explain issues which
may not be covered by the document.
Nokia Siemens Networks will correct errors in this documentation as soon as possible. IN NO
EVENT WILL Nokia Siemens Networks BE LIABLE FOR ERRORS IN THIS DOCUMENTA-
TION OR FOR ANY DAMAGES, INCLUDING BUT NOT LIMITED TO SPECIAL, DIRECT, INDI-
RECT, INCIDENTAL OR CONSEQUENTIAL OR ANY LOSSES, SUCH AS BUT NOT LIMITED
TO LOSS OF PROFIT, REVENUE, BUSINESS INTERRUPTION, BUSINESS OPPORTUNITY
OR DATA,THAT MAY ARISE FROM THE USE OF THIS DOCUMENT OR THE INFORMATION
IN IT.
This documentation and the product it describes are considered protected by copyrights and
other intellectual property rights according to the applicable laws.
The wave logo is a trademark of Nokia Siemens Networks Oy. Nokia is a registered trademark
of Nokia Corporation. Siemens is a registered trademark of Siemens AG.
Other product names mentioned in this document may be trademarks of their respective
owners, and they are mentioned for identification purposes only.
Copyright Nokia Siemens Networks 2007. All rights reserved
f Important Notice on Product Safety
Elevated voltages are inevitably present at specific points in this electrical equipment.
Some of the parts may also have elevated operating temperatures.
Non-observance of these conditions and the safety instructions can result in personal
injury or in property damage.
Therefore, only trained and qualified personnel may install and maintain the system.
The system complies with the standard EN 60950 / IEC 60950. All equipment connected
has to comply with the applicable safety standards.
The same text in German:
Wichtiger Hinweis zur Produktsicherheit
In elektrischen Anlagen stehen zwangslufig bestimmte Teile der Gerte unter Span-
nung. Einige Teile knnen auch eine hohe Betriebstemperatur aufweisen.
Eine Nichtbeachtung dieser Situation und der Warnungshinweise kann zu Krperverlet-
zungen und Sachschden fhren.
Deshalb wird vorausgesetzt, dass nur geschultes und qualifiziertes Personal die
Anlagen installiert und wartet.
Das System entspricht den Anforderungen der EN 60950 / IEC 60950. Angeschlossene
Gerte mssen die zutreffenden Sicherheitsbestimmungen erfllen.




3
Recovery
Id:0900d8058008c996
Table of Contents
This document has 78 pages.
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Summary of Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Installation Recovery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Central Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Recovery in the Call-processing Periphery. . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Starting Recovery Levels and Supervising Their Running . . . . . . . . . . . . . 12
2.5 Escalation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.6 Transition to Basic Call-processing in the CP. . . . . . . . . . . . . . . . . . . . . . . 13
2.7 Central and Installation RecoveryEffects of
Recovery and Recovery Actions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.8 Starting Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.9 Symptom Saving. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.9.1 Standard Symptom Saving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.9.2 Data CollectionAdditional Symptom Collection and
Immediate Symptom Saving. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.10 Postprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3 Installation Recovery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1 INSTALL Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Initial Start 2 Forced Recovery (ISTART2F) . . . . . . . . . . . . . . . . . . . . . . . . 27
4 Central Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1 CP Recovery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1.1 New Start Level 0 (NSTART0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1.2 New Start Level 1 (NSTART1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.3 New Start Level 2 (NSTART2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.4 New Start Level 3 (NSTART3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1.5 New Start Level 1B (NSTART1B) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 System Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.1 Initial Start Level 1 (ISTART1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.2 Initial Start Level 1B (ISTART1B) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2.3 Initial Start Level 2 (ISTART2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.4 Initial Start Level 2R (ISTART2R) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2.5 Initial Start Level 2G (ISTART2G). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3 Escalation Strategy for Central Recovery Levels . . . . . . . . . . . . . . . . . . . . 43
5 Recovery in the Call-processing Periphery. . . . . . . . . . . . . . . . . . . . . . . . . 46
5.1 LTG Recovery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.1.1 Recovery Level 1 (LTG New Start) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.1.2 Recovery Level 2.1 (LTG Initial Start 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.1.3 Recovery Level 2.2 (LTG Initial Start 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.1.4 Escalation Strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.2 DLU Recovery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.2.1 Recovery Level New Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2.2 Recovery Level Initial Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53


4
Recovery
Id:0900d8058008c996
5.2.3 Escalation Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.3 CCNC Recovery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.3.1 Recovery Level 2, Initial Start for an SILTD, Initial Start for an
SILT Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.3.2 Recovery Level 2, Initial Start for a CCNP. . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.3.3 Recovery Level 2, Initial Start for the Entire CCNC . . . . . . . . . . . . . . . . . . . 58
5.3.4 Escalation Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.4 RSU Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.4.1 Recovery Level 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.4.2 Recovery Level 2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.5 MCP Recovery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.5.1 MCP Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.5.2 MCP Soft Reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.5.3 MCP Full Reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.5.4 Error Handling in Case of Isolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.5.5 Escalation Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.6 MCT Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.6.1 Recovery Level 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.6.2 Recovery Level 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.6.3 Recovery Level 2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.6.4 MCT 2.1 Bulk Recoveries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.6.5 Escalation Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6 MP Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.1 Recovery Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.2 Recovery Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.2.1 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.2.2 Low-level Recoveries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.2.3 High-level Recoveries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.3 Parallel-running Recoveries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.4 Escalation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.5 Symptom Saving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.5.1 Standard Symptom Saving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.5.2 Data CollectionAdditional Symptom Collection and
Immediate Symptom Saving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77




5
Recovery
Id:0900d8058008c996
List of Figures
Figure 1 Main recovery actions in the CP, LTG/MCT, MCP, RSU, DLU and CCNC
with an ISTART2F, ISTART2R, ISTART2G. . . . . . . . . . . . . . . . . . . . . . 29
Figure 2 Diagram of CCNC recovery escalation . . . . . . . . . . . . . . . . . . . . . . . . . 59
Figure 3 Handling of software problems by Software Recovery . . . . . . . . . . . . . 68


6
Recovery
Id:0900d8058008c996
List of Tables
Table 1 Effects of recovery by the central and installation recovery levels on calls
and call-charge data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Table 2 Main recovery actions of the central and installation recovery levels
in the CP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Table 3 Handling of the call-processing periphery by central and installation
recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Table 4 Starting options for recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Table 5 Identification of system files created by data collection for
saving data collection symptoms (DC.V<nn>) . . . . . . . . . . . . . . . . . . . 25
Table 6 Identification of system files created by data collection for
saving data collection symptoms (DC.B<nn>) . . . . . . . . . . . . . . . . . . . . 25
Table 7 Recovery actions that are optional or fixed in an ISTART2,
depending on how the recovery was started . . . . . . . . . . . . . . . . . . . . 39
Table 8 Central recovery levels and their escalation parameters . . . . . . . . . . . 44
Table 9 Brief overview of the main recovery effects of the LTG recovery levels 47
Table 10 LTG recovery levels and their escalation parameters . . . . . . . . . . . . . . 52
Table 11 DLU recovery levels and their escalation parameters . . . . . . . . . . . . . 54
Table 12 Downtimes for transient calls (recovery level 2.1 in the RSU units) . . . 61
Table 13 Downtimes for transient calls (recovery level 2.2 in the RSU units) . . . 62
Table 14 Downtimes for established calls (recovery level 2.2 in the RSU units) 63
Table 15 MTP recovery levels and their escalation parameters . . . . . . . . . . . . . 66
Table 16 MCT recovery levels and their escalation parameters . . . . . . . . . . . . . 67
Table 17 Synchronization points (SP) in high-level recoveries . . . . . . . . . . . . . . 70
Table 18 Parallel-running recoveries (initiated due to software error) . . . . . . . . . 73
Table 19 Parallel-running recoveries (initiated by a system operator) . . . . . . . . . 73
Table 20 MP recovery levels and their escalation parameters . . . . . . . . . . . . . . 75




7
Recovery Introduction
Id:0900d8058008c5bb
1 Introduction
SURPASS hiE 9200 is a system with extensive system safeguarding features. These
system safeguarding features guarantee subscribers and providers full availability and
performance of the SURPASS hiE 9200 system.
A component part of the extensive system safeguarding features is the recovery func-
tion. Recovery encompasses all measures which aim at putting the hardware and
software of the system into service (start-up).
Recovery applications
Portrayed in simplified terms, recovery has two applications:
First-time installation of an APS generation or in-service change of APS generation
(upgrade).
Restoration of switching operation after a fault during operation.
Recovery types
Three types of recovery ensue from the two applications:
Installation recovery with the aim of putting the system into service.
This encompasses recovery actions for initial start-up of the system in a new
network node and recovery actions for using a new APS version (APS change) in
an existing network node.
Central recovery with the aim of restoring switching operation after a fault in the
coordination processor (CP).
This encompasses recovery actions which neutralize a fault and restore perfor-
mance of the network node during operation.
Recovery in the call-processing periphery with the aim of returning subsystems to
service after a fault.
This encompasses recovery actions which neutralize a fault and make the sub-
system affected available to the CP again during operation.
Subsystems of the call-processing periphery in which a recovery can be performed
are: line/trunk group (LTG), media control task (MCT), digital line unit (DLU), remote
switching unit (RSU) and media control platform (MCP) as well as common channel
signaling network control (CCNC) or signaling system network control (SSNC).
Recovery organization
Installation recovery for initial start-up of the system must be seen as a separate entity
from central and peripheral recovery.
There are analogies between central recovery and recovery in the call-processing
periphery: both serve to restore switching operation after a fault. The higher recovery
levels of central recovery initiate a recovery in the call-processing periphery when
serious faults occur. SSNC and CP operate in a loose form of cooperation; recoveries
in the SSNC are therefore executed independently of recoveries in the CP.
An overview of the three recovery types can be found in sections Installation Recovery,
Central Recovery and Recovery in the Call-processing Periphery, seen from the follow-
ing standpoints: the reason for a recovery, the task and levels of recovery, the areas
affected by recovery, and the effects of recovery.
Aim of recovery
The aim of recovery is to neutralize a fault in such a way that switching operation is either
not at all impaired or impaired only to a slight extent. Central recovery and recovery in


8
Recovery
Id:0900d8058008c5bb
Introduction
the call-processing periphery are for this purpose divided up into recovery levels. The
individual recovery levels initiate specific recovery actions (depending on the type of
fault) which quickly and effectively restore the system to full performance.
Effects of recovery on switching operation
As the results of statistical investigations have shown, the majority of recovery cases do
not lead to any noticeable impairment of switching operation.
In the remaining cases of recovery, more extensive recovery actions are initiated (see
section System Recovery).
Structure of this document
g This document is applicable to all configurations of SURPASS hiE 9200 (TDM, IP
and combined TDM/IP network node). In a pure TDM environment also the more
conventional configuration with CCNC is possible. All statements relating to the
CCNC refer to this conventional configuration, while statements relating to the
SSNC refer to all other configurations.
Chapter Summary of Characteristics provides a brief description of the basic
features and functions of recovery.
The individual recovery levels themselves are described in chapters Installation
Recovery, Central Recovery and Recovery in the Call-processing Periphery in such
a way that they can be read independently of each other. This structural organization
has been chosen specially for those readers who wish to read up on recovery
actions and the effects of recovery of a specific recovery level.
The format of these completely independent sections results in a small degree of
informational redundancy which brings with it the advantage of enabling the reader
to glean ample information from one part of the document alone.
The escalation strategies for recovery in the call-processing periphery are described
in chapter Recovery in the Call-processing Periphery.
MP recoveries in the SSNC are described separately in chapter MP Recovery.
References to the RSU are only relevant to systems where RSUs are connected to
the network node.
In this document the term referred to as ISTART for system recovery corresponds
to the term USTART in the recovery output masks.




9
Recovery Summary of Characteristics
Id:0900d8058008c5bc
2 Summary of Characteristics
This chapter provides a brief and clear summary of the important features and functions
of recovery.
2.1 Installation Recovery
Reason
Initial start-up of a new network node.
APS change in an existing network node.
Task/Aim
Installation of an APS generation.
Start-up of the system.
Recovery levels
INSTALL Recovery
For installing an APS generation, including installing and completing the database.
Aim: system runup in the CP.
Initial Start 2 Forced Recovery (ISTART2F)
For running up the system in the CP and for putting the call-processing periphery
into service.
Aim: commencement of switching operation.
Recovery areas
With INSTALL recovery: CP.
With ISTART2F recovery: CP and call-processing periphery.
Effects of recovery with an APS change
The system is split into two system halves: a switching (call-processing) system half and
a non-switching system half.
The APS change is performed in the non-switching system-half: installation and runup
of the new APS generation (by means of INSTALL recovery).
Meanwhile, switching operation is continued in the switching system-half.
Once the new APS generation has been verified, the two halves of the system are
merged and switching operations are transferred to the new APS generation. The
various procedures involved are described in the Nonstandard Maintenance manual
(NM:SW).
2.2 Central Recovery
Reason
Fault which occurs in the CP during operation.
System operator starts one of the recovery levels of central recovery by entering an
MML command.


10
Recovery
Id:0900d8058008c5bc
Summary of Characteristics
Task/Aim
Central recovery in the CP with the aim of neutralizing the fault using the recovery level
which least impairs switching operation and immediately restores the system to full per-
formance.
Recovery levels of CP recovery
CP recovery is subdivided into several new start (NSTART) recovery levels, including:
NSTART0, NSTART1, NSTART2, NSTART3, and NSTART1B.
NSTART recovery levels.
The NSTART recovery levels are started by faults:
Which occur in the CP.
Which call for recovery actions in the CP.
Which do not call for recovery actions in the call-processing periphery.
The faults which lead to an NSTART are:
Software faults which are detected by the user processes.
Certain hardware faults which are restricted locally to one operating unit in the
CP; for example, failure of a single processor.
NSTART1B recovery level.
The NSTART1B recovery level initiates basic call-processing.
NSTART1B is started if the following exceptional situation arises:
A fault in the CP calls for recovery level NSTART2, NSTART3 or ISTART1 and,
simultaneously,
Access is not possible to certain system files.
In such an exceptional situation, NSTART2, NSTART3 or ISTART1 escalates to
recovery level NSTART1B which then initiates basic call-processing.
Recovery levels of system recovery
System recovery is subdivided into the following initial start (ISTART) recovery levels:
ISTART1, ISTART2, ISTART2R and ISTART2G, and ISTART1B.
ISTART recovery levels.
The ISTART recovery levels are started in response to faults
Which occur in the CP.
Which call for recovery actions in the CP.
Which call for additional recovery actions in the call-processing periphery.
The faults which lead to an ISTART are:
Software error which are detected by operating system processes.
Serious hardware faults, such as a short-term fault in both bus systems.
Recovery escalation if the highest NSTART recovery level escalates to the
lowest ISTART recovery level.
ISTART1B recovery level.
The ISTART1B recovery level also initiates basic call-processing.
ISTART1B is started if the following situation arises:
Recovery level ISTART1 has been started, but it cannot neutralize the fault. The
reason for this is that further faults have occurred while ISTART1 is running or during
the subsequent supervision phase (after ISTART has been completed).
In such a situation, ISTART1 escalates to recovery level ISTART1B which then initiates
basic call-processing.




11
Recovery Summary of Characteristics
Id:0900d8058008c5bc
Recovery areas
The following is valid for recovery actions:
NSTARTs are restricted to the CP.
ISTARTs also affect the call-processing periphery.
Effects of recovery
The SASDATS (saving of software data with statistics) function is used to document
software problems. The process that has detected a software problem can use
SASDATS (or SASDAT) to provide a documentary record at the same time as attempt-
ing to neutralize the fault. The few remaining software faults are themselves then neu-
tralized by the recovery level NSTART0. There is no impairment in call-processing in
either case.
With the NSTART recovery levels, the calls are maintained and the call-charge data in
the memory are retained.
With the ISTART recovery levels, existing calls are cleared down and after a brief inter-
ruption (system initialization), switching operation is resumed. The call-charge data in
the memory are retained (exceptions: ISTART2 and ISTART2G).
2.3 Recovery in the Call-processing Periphery
Reason
Faults which occur in the LTG, RSU, DLU or CCNC or SSNC during operation.
System recovery which runs in the CP, and initiates a recovery in the LTG, DLU, and
CCNC (the CP is not capable of launching an SSNC recovery).
Task/Aim
Peripheral recovery in the subsystem affected with the aim of immediately making the
subsystem available to the system again.
Recovery levels
Various recovery levels, such as new start or initial start, are available depending on the
subsystem and the effect of the fault in the subsystem.
Recovery area
Recovery in the call-processing periphery is restricted in each case to the subsystem in
which it runs. There is a special situation between LTG operating in the B-function and
the DLU connected (see LTG Recovery and DLU Recovery).
Effects of recovery
When a recovery is run in the LTG or DLU, switching operation is interrupted only briefly
or not at all. Due to the load sharing aspect of the DLU, only up to 50% of the subscribers
of a DLU are affected.
When a recovery is run in the CCNC, the redundant unit continues switching operation
without any trouble.
In the case of a system recovery which initiates a recovery in the LTG, RSU, DLU, and
CCNC, starting from the CP, call-processing is interrupted briefly until the recovery
actions have been completed.


12
Recovery
Id:0900d8058008c5bc
Summary of Characteristics
2.4 Starting Recovery Levels and Supervising Their Running
Each fault is neutralized by a recovery level in the following cases:
It is especially oriented to the fault situation that has arisen.
It removes the effects by means of direct recovery actions.
The scope and effect of the recovery actions increases from recovery level to recovery
level. Thanks to the specially-designed recovery levels, the system is restored quickly
and effectively to full performance.
Starting recovery levels
The program that detects a fault calls for the safeguarding programs to start a specific
recovery level. In the case of central recovery in the CP, the software fault analysis starts
the recovery level called for. In the case of recovery in the LTG, RSU, DLU or CCNC,
there is a specific (LTG, DLU or CCNC) safeguarding program for each case.
Supervising running of a recovery level
To ensure that the recovery level called for clears the fault completely, the safeguarding
programs carry out several supervisory checks:
Run supervision: Check to ensure the recovery level runs properly. Check to see
whether further faults occur while the recovery level is running.
Time supervision: Check to ensure the recovery level runs successfully within a
specified period; that is, neutralization is completed successfully.
Success supervision: Check to see whether (further) faults occur again after neutral-
izationduring the subsequent supervision phaseand whether the defined error
threshold is exceeded.
If one of the checks returns a negative result, recovery escalation ensues (see section
Escalation).
The safeguarding programs in the CP also have a higher-order function: they check the
course of recovery which runs in the LTG, RSU, DLU or CCNC, from the central location
of the CP.
2.5 Escalation
Escalation parameters
The safeguarding programs initiate an escalation to the next-highest recovery level if
certain conditions arise. These conditions are defined by means of escalation parame-
ters, such as:
Supervision time
This is the length of time during which the respective safeguarding program super-
vises whether and how many further faults occur.
Error threshold limit
If a certain number of faults has occurred within the supervision time, that is, the
error threshold limit set in the system has been reached, then recovery escalates.
The respective safeguarding program starts the next-highest recovery level.
The escalation parameters for central recovery are described in chapter Central Recov-
ery; for LTG, RSU, DLU, and CCNC recovery in sections LTG Recovery, DLU Recov-
ery, CCNC Recovery and RSU Recovery and for MP recovery in section Escalation in
chapter MP Recovery.




13
Recovery Summary of Characteristics
Id:0900d8058008c5bc
Aims of escalation
The following are the aims pursued by escalation:
To prevent faults from repeatedly initiating a recovery of the same level.
To prevent faults from being aggravated and spreading further.
To prevent faults from unnecessarily impairing switching operation.
Escalation strategy of central recovery: initiating basic call-processing
The basic features of the escalation strategy of central recovery are presented first, as
they are of central significance for switching operation.
The escalation strategy in the CP allows the NSTART recovery levels (of CP recovery)
to be able to escalate up to but no further than recovery level ISTART1. ISTART1 is the
lowest recovery level of system recovery. If the ISTART1 does not lead to success
either, the software fault analysis then initiates basic call-processing via the ISTART1B
recovery level.
In exceptional cases, NSTART2, NSTART3 or ISTART1 escalates directly to the
NSTART1B recovery level which likewise initiates basic call-processing (see section
Transition to Basic Call-processing in the CP and section Escalation Strategy for Central
Recovery Levels).
2.6 Transition to Basic Call-processing in the CP
The system status known as Basic call-processing allows a greatly reduced process
set to be activated. Basic call-processing guarantees that the basic call-processing func-
tions are maintained. In this manner, faults in areas of the software which are not con-
cerned with call-processing are masked out. The functions which are of no relevance to
call-processing are not put into service during system recovery. Basic call-processing
therefore represents a measure for not displaying software errors in areas of the
software which are not relevant to call-processing.
Basic call-processing is not created for network nodes in mobile communication net-
works.
Basic call-processing is initiated via recovery level ISTART1B or NSTART1B in the fol-
lowing situations:
On escalation from ISTART1 to ISTART1B. Reason for escalation: ISTART1 is not
able to neutralize a fault. Further faults have occurred while ISTART1 is running or
during the subsequent supervision phase (after ISTART has been completed).
On escalation from NSTART2, NSTART3 or ISTART1 to NSTART1B. Reason for
escalation: the exceptional situation has arisen that it is not at this moment possible
to access certain system files.
Several aims are pursued by basic call-processing:
Escalation to higher recovery levels with serious effects on the system (for example,
automatic fallback to a saved generation) is prevented.
Switching operation is maintained.
The operation and maintenance personnel have the opportunity to localize and neu-
tralize the fault while switching operation is continued.
During basic call-processing:
Switching operation is continued without any restrictions.
The call-processing functions can be used.
The LTG, RSU, DLU and CCNC continue to run without any functional restrictions
in the normal call-processing mode.


14
Recovery
Id:0900d8058008c5bc
Summary of Characteristics
Restrictions in the basic call-processing mode
The operating functions are restricted to commands used for fault clearance. The
relevant actions are described in the emergency manual EMCYMN.
The charge units for calls are calculated on the basis of the table with the lowest
charge rates.
The transition from basic call-processing back to normal operation is performed
depending on the preceding situationby means of recovery level ISTART1 (start
via MML command) or recovery level ISTART2 (start via BOOT key). The relevant
actions and the exact procedures are described in the emergency manual
EMCYMN.
No messages are entered in the history files during basic call-processing.
2.7 Central and Installation RecoveryEffects of
Recovery and Recovery Actions
The effects of recovery and the recovery actions are listed according to various criteria
in three tables:
Effects of recovery in the CP on calls and call-charge data.
The effects of recovery on calls and on call-charge data in the CP memory are
shown in table Effects of recovery by the central and installation recovery levels on
calls and call-charge data.
Recovery actions in the CP for processors and the database.
The main recovery actions for base processors and call processors and for the
database in the CP memory are shown in table Main recovery actions of the central
and installation recovery levels in the CP.
Effects of recovery on the subsystems of the call-processing periphery.
Central recovery and installation recovery initiate different recovery actions in the
subsystems of the call-processing periphery (see table Handling of the call-process-
ing periphery by central and installation recovery). Also listed in table Handling of the
call-processing periphery by central and installation recovery are the exceptional
cases in which no recovery actions are performed for certain subsystems.
The subsystems of the call-processing periphery are as follows:
LTG, RSU, DLU and CCNC.
Switching network (SN).
Message buffer (MB).
The LTG, RSU, DLU and CCNC have their own software which is loaded from the
CP. The SN and MB have no software; they are activated and reset by the CP.
Recovery levelsreciprocal delimitation
Tables Effects of recovery by the central and installation recovery levels on calls and
call-charge data, Main recovery actions of the central and installation recovery levels in
the CP and Handling of the call-processing periphery by central and installation recovery
mentioned above also show the differences between the recovery levels of central and
installation recoveries with respect to their recovery actions and effects, and how their
functions are reciprocally delimited.




15
Recovery Summary of Characteristics
Id:0900d8058008c5bc
Recovery
levels
Effects of
recovery on
existing calls
Effects of recovery
on calls being set up
or cleared down
Effects of
recovery on
incoming
seizure
requests
Effects of
recovery on
call-charge
data in the CP
memory
NSTART0 The existing
calls are
maintained
Calls which are in the
process of being set
up or cleared down
are maintained and
processed further
without any problem.
Exception: in the case
of an alarm-induced
master switchover, it
is possible for a single
call being set up or
cleared down to be
lost
No effect The call-charge
data in the CP
memory are
retained
NSTART1,
NSTART2,
NSTART3,
NSTART1B
The existing
calls are
maintained
On the BAP and CAP
processors, it is
possible for one of the
calls in the process of
being set up or
cleared down in each
processor, to be lost
per processor.
All other calls in the
process of being set
up or cleared down
are maintained and
processed further
without any problem
With
NSTART1
and
NSTART1B
no effect.
With
NSTART2
and
NSTART3:
call
acceptance
in the CP is
interrupted
briefly during
recovery
The call-charge
data in the CP
memory are
retained
ISTART1,
ISTART1B,
ISTART2,
ISTART2R,
ISTART2F
The existing
calls are
cleared down
The calls in the
process of being set
up or cleared down
are lost
Are rejected
by the LTGs
during
recovery
The call-charge
data in the CP
memory are
retained.
One exception
to this is
ISTART2, see
Legend
Table 1 Effects of recovery by the central and installation recovery levels on calls
and call-charge data


16
Recovery
Id:0900d8058008c5bc
Summary of Characteristics
Note relating to table Effects of recovery by the central and installation recovery levels
on calls and call-charge data: Handling of call-charge data in CP main memory in con-
nection with ISTART2:
When ISTART2 is started by the software error treatment or via an MML command,
the CP memory is formatted before loading. This clears the call-charge data from
the memory. The basis for continuing call-charge registration is the call-charge
database last saved. After ISTART2 has finished running, the call-charge count last
saved is automatically loaded from the system disk into the CP memory; see also
ISTART2 in section System Recovery.
When ISTART2 is started via the BOOT key, the system operator is able to specify
that the CP memory is not formatted before loading. In this case, the call-charge
data in the CP memory are retained; see also ISTART2 in section System Recovery.
ISTART2G The existing
calls are
cleared down
The calls in the
process of being set
up or cleared down
are lost
Are rejected
by the LTGs
during
recovery
The CP
memory is
formatted
before loading.
Restoration of
the call-charge
data by means
of the call-
charge
database
stored in the
user file, as
described in
the
Maintenance
Manual
MMN:SW
INSTALL - - - -
Recovery
levels
Main recovery actions in the
BAP and CAP processors
Main recovery actions in the
database of the common
memory
NSTART0 New start of those processes not
relevant to call-processing in the
BAP master
None
Table 2 Main recovery actions of the central and installation recovery levels
in the CP
Recovery
levels
Effects of
recovery on
existing calls
Effects of recovery
on calls being set up
or cleared down
Effects of
recovery on
incoming
seizure
requests
Effects of
recovery on
call-charge
data in the CP
memory
Table 1 Effects of recovery by the central and installation recovery levels on calls
and call-charge data (Cont.)




17
Recovery Summary of Characteristics
Id:0900d8058008c5bc
NSTART1 New start of the cyclic processes
and resumption of switching
operation
None
NSTART2 New start of the cyclic processes
and resumption of switching
operation
Reloading of semipermanent data
NSTART3 - In exceptional cases: reloading
of the program code of all resident
processes
- New start of the cyclic processes
and resumption of switching
operation
- Reloading of semipermanent data
and certain transient data
- Dynamic initialization of specific
transient data. Exception: memory
areas for call-charge data and for
transient call-processing data
NSTART1B No loading processes, because it
is not possible at the moment to
access certain system files
New start of the cyclic processes
relevant to call-processing and
starting of basic call-processing
No loading processes, because it
is not possible at the moment to
access certain system files
ISTART1,
ISTART1B
Reloading of the program code of
all resident processes
- With ISTART1: new start of all
cyclic processes and resumption
of switching operation
- With ISTART1B: new start of the
cyclic processes relevant to call-
processing and starting of basic
call-processing
Reloading of semipermanent data
and certain transient data
Dynamic initialization of specific
transient data. Exception: memory
area for call-charge data
ISTART2,
ISTART2R,
ISTART2F
Reloading of the program code of
all resident processes
New start of the cyclic processes
and resumption of switching
operation
See Legend with respect to
ISTART2
Formatting of the CP memory with
ISTART2 only, see Legend
Reloading of semipermanent data
and specific transient data
Dynamic initialization of specific
transient data (with ISTART2R and
ISTART2F, memory area for call-
charge data excepted)
Recovery
levels
Main recovery actions in the
BAP and CAP processors
Main recovery actions in the
database of the common
memory
Table 2 Main recovery actions of the central and installation recovery levels
in the CP (Cont.)


18
Recovery
Id:0900d8058008c5bc
Summary of Characteristics
Legend concerning ISTART2 and INSTALL:
When ISTART2 is started via the BOOT key or when INSTALL is started, the system
operator is able to specify the following:
Whether the memory is to be formatted before loading.
Which generation is to be loaded: the current generation, the backup generation
or the golden generation; loading of a saved generation is a fallback to a
previous software status; see also ISTART2 in section System Recovery.
When ISTART2 is started by the software fault analysis or via an MML command,
the CP memory is formatted before loading; see also ISTART2 in section System
Recovery.
ISTART2G With ISTART2G, loading of a
saved generation
(fallback to a previous software
version)
- Reloading of the program code
from a saved generation
- New start of the cyclic processes
and resumption of switching
operation
With ISTART2G, loading of a
saved generation
(fallback to a previous software
version)
- Formatting of the CP memory
- Reloading of semipermanent and
specific transient data from a
saved generation
- Dynamic initialization of specific
transient data (including call-
charge data)
INSTALL Loading of the program code into
the BAP master
Manual starting of selected
processes, for example, of logical
input/output
Formatting of the CP memory,
see Legend
Loading of semipermanent and
transient data
Recovery
levels
Main recovery actions in the
BAP and CAP processors
Main recovery actions in the
database of the common
memory
Table 2 Main recovery actions of the central and installation recovery levels
in the CP (Cont.)




19
Recovery Summary of Characteristics
Id:0900d8058008c5bc
Recovery
levels
Regular cases
Recovery actions in the
call-processing
subsystems
Exceptional cases
Certain call-processing subsystems
which are in specific operating
states are excluded from recovery
actions
NSTART0,
NSTART1,
NSTART2,
NSTART3,
NSTART1B
Generally speaking no
recovery actions
Generally speaking no recovery actions
ISTART1,
ISTART1B
Resetting and activating all
subsystems
Initiating recovery
- in the LTG an initial start 1
(see section Recovery Level
2.1 (LTG Initial Start 1))
- in the RSU a recovery level
2.1 (see section Recovery
Level 2.1)
- in the DLU an initial start
(see section Recovery Level
Initial Start)
- no recovery actions are
required for the CCNC
No handling of the LTG, MB and SN in
the operating states: planned and
maintenance blocked.
ISTART2 Resetting and activating all
subsystems
Initiating recovery
- in the LTG an initial start 1
(see section Recovery Level
2.1 (LTG Initial Start 1))
- in the RSU a recovery level
2.1 (see section Recovery
Level 2.1)
- in the DLU an initial start
(see section Recovery Level
Initial Start)
- in the CCNC an initial start
for the entire CCNC (see
section Escalation Strategy)
No handling of the LTG, MB and SN in
the operating states: planned and
maintenance blocked
Table 3 Handling of the call-processing periphery by central and installation
recovery


20
Recovery
Id:0900d8058008c5bc
Summary of Characteristics
Notes relating to table Handling of the call-processing periphery by central and installa-
tion recovery: ISTART2 and ISTART2G:
When ISTART2 is started via the BOOT key, the system operator is able to specify
the following:
Which generation is to be loaded (the current generation, the backup generation
or the golden generation).
Whether the LTGs are also to be loaded with the program code.
With an ISTART2G, a saved generation (backup generation or golden generation)
is loaded.
2.8 Starting Options
Starting options of central recovery
The recovery levels of central recovery can be started either automatically or manually;
see table Starting options for recovery.
Automatic start of the recovery levels by the software error treatment:
Following SWSG call from a error-detecting user or system process.
Following escalation of a recovery level.
ISTART2R,
ISTART2G,
ISTART2F
Resetting and activating all
subsystems
Initiating recovery
- in the LTG an initial start 2
(see section Recovery Level
2.2 (LTG Initial Start 2))
- in the RSU a recovery level
2.2 (see section Recovery
Level 2.2)
- in the DLU an initial start
(see section Recovery Level
Initial Start)
- in the CCNC an initial start
for the entire CCNC (see
section Recovery Level 2,
Initial Start for the Entire
CCNC)
In ISTART2R and ISTART2G:
No handling of the LTG, MB and SN in
the operating states: planned and
maintenance blocked.
In ISTART2F:
No handling of the LTG, MB and SN in
the operating state maintenance
blocked.
INSTALL Generally speaking no
recovery actions
Generally speaking no recovery actions
Recovery
levels
Regular cases
Recovery actions in the
call-processing
subsystems
Exceptional cases
Certain call-processing subsystems
which are in specific operating
states are excluded from recovery
actions
Table 3 Handling of the call-processing periphery by central and installation
recovery (Cont.)




21
Recovery Summary of Characteristics
Id:0900d8058008c5bc
Automatic start of certain recovery levels by the hardware fault analysis:
Following certain hardware faults in the CP.
Manual start of the recovery levels by the system operator:
Via MML command.
Via BOOT key.
When recovery level ISTART2 is started manually via the BOOT key, the system
operator is also able to select and specify the following:
The system disk from which loading is performed.
The generation to be loaded (current generation or one of the saved generations,
such as the backup generation or golden generation).
Whether the CP memory is to be formatted before loading.
Whether the LTGs are also to be loaded with the program code.
Starting options of installation recovery
INSTALL is started manually via the BOOT key. After INSTALL has run, ISTART2F is
started via an MML command.
As a result of ISTART2F, installation recovery is exited and the system is transferred to
switching operation.
Automatic starting by: Manual starting by:
Software error
treatment
Hardware fault
analysis
MML command BOOT key
NSTART 0 NSTART 0 NSTART 0 ---
NSTART 1 NSTART 1 NSTART 1 ---
NSTART 2 NSTART 2 NSTART 2 ---
NSTART 3 NSTART 3 NSTART 3 ---
NSTART 1B --- --- ---
ISTART 1 ISTART 1 ISTART 1 ---
ISTART 2R * ISTART 2 ISTART 2 ISTART 2 (R)**
ISTART 2 (G)***
--- --- ISTART 2R ---
--- --- ISTART 2F ISTART 2F
ISTART 2G -- ISTART 2G
ISTART 1B --- --- ---
Table 4 Starting options for recovery
Notes relating to table Starting options for recovery:
* Started after total failure of the periphery
** Manual recovery with reloading of LTG code and data
*** When the BOOT key is pressed, an ISTART is first executed. If the system operator
specifies an APS generation that is older than the current generation, an
ISTART2G is executed.


22
Recovery
Id:0900d8058008c5bc
Summary of Characteristics
For the sake of simplicity, the starting options of INSTALL and ISTART2F are also listed
in table Starting options for recovery. The exact procedures and conditions for starting
are described in the emergency manual EMCYMN.
Starting options of recovery in the call-processing periphery
Recovery in the subsystems (such as the LTG, RSU, DLU or CCNC) can be started
either automatically or manually.
Automatically by the safeguarding programs of the subsystem:
When called from a fault-detecting program in the subsystem.
Following escalation of the recovery levels in the subsystem.
Following a system recovery which runs in the CP and initiates a recovery in the sub-
systems.
Manually by the system operator:
Via an MML configuration job which initiates a recovery in the subsystem, for example,
connecting an LTG for return to service after diagnosis.
2.9 Symptom Saving
Recovery has two modes of symptom saving:
Standard Symptom Saving
Data CollectionAdditional Symptom Collection and Immediate Symptom Saving
2.9.1 Standard Symptom Saving
Standard symptom saving takes place in all central recovery levels in the CP, and in all
recovery levels in the LTG, RSU, DLU and CCNC. Before the recovery actions are
started, standard symptom saving stores the symptoms in a save area. This save area
is a recovery-safe data area in the respective memories of the CP, LTG, RSU, DLU
and CCNC.
In cases of standard symptom saving, the following information is output or saved:
Advisory message informing that a fault has occurred (data of the SWSG supervisor
call, module name, subsystem).
Advisory message about the loading processes in the LTG, RSU, DLU or CCNC
(the message is not given when loading is not necessary).
Advisory message informing that recovery has been completed and that switching
operation is continuing without problem.
Symptoms of the fault.
The advisory messages are generally output at the management system and saved in
a history file; see the section entitled History files in the system.
Saving and output of the symptoms of a fault is organized differently for central recovery
compared with recovery in the LTG, RSU, DLU and CCNC; see the following sections.
Standard symptom saving with central recovery
Before recovery is started, the software error treatment of the CP compiles the
symptoms in the form of a symptom package and stores this symptom package in a
save area in the CP memory.
Once the recovery has been completed, the software fault analysis transfers the
symptom package from the save area to the history file called SG.SESYMP; see the
section entitled History files in the system.




23
Recovery Summary of Characteristics
Id:0900d8058008c5bc
Output of the symptom packages from the SG.SESYMP history file is controlled by
means of a procedure which is described in the Maintenance Manual MMN:SW.
Standard symptom saving with a recovery in the LTG, RSU, DLU and CCNC
In the case of a recovery in the LTG, RSU, DLU or CCNC, symptom saving is organized
as follows: the respective safeguarding programs store the symptoms in an Error-Note-
book. This error notebook is a save area that is declared in the memories of the LTG
and CCNC. The error notebook for DLU is also declared in the memories of the LTG.
The symptoms are output from the respective error notebook when requested by the
system operator by means of an MML command. The output medium can be, for
example, a management system or magneto-optical disk.
History files in the system
A number of history files are installed (redundant on the two system disks) for the
purpose of saving the various pieces of information. History files vary in number and
type. Therefore, the following list can only be used as an example:
2.9.2 Data CollectionAdditional Symptom Collection and
Immediate Symptom Saving
Reason for data collection
Additional symptom saving, also known as data collection, takes place in addition to
standard symptom saving on interruption (escalation) of a system recovery due to a
software error.
In the exceptional case of an active system recovery being interrupted, data collection
is started before new system recovery actions and thus before formatting of the memory.
Incorporation of data collection in the sequence of operations
The sequence of events is as follows:
The exceptional case that an active ISTART recovery level must be interrupted has
arisen.
Due to this interruption of ISTART, data collection is activated and, in addition to the
standard symptoms, it collects additional supplementary symptoms about the
SG.OPER contains safeguarding messages that are exchanged between
the CP and the periphery along with CP-internal messages
which are connected to the periphery.
SG.SESYMP contains the symptom packages of the software errors that
have occurred
HF.ARCHIVE contains the recovery end masks and alarm and fault
messages that are initiated by maintenance, routine-test or
auditing programs
HF.ALARM contains alarm messages that are displayed on the system
panel display
AM.ALARM contains alarm messages about acute alarms. If the alarm is no
longer acute (fault is being analyzed or has been cleared), the
relevant alarm message in AM.ALARM is cleared again.
HF.MCP.HWERROR contains messages about sporadic hardware faults


24
Recovery
Id:0900d8058008c5bc
Summary of Characteristics
ISTART interruptions and immediately saves the symptoms on the two system disks
(redundant saving).
Once data collection has been completed, recovery starts the next higher ISTART
recovery level (see also the escalation strategy described in section Escalation
Strategy for Central Recovery Levels). If a further software fault occurs while the
next higher ISTART recovery level is running, the ISTART recovery level is inter-
rupted again: once again, data collection collects data collection symptoms and
afterwards, a new ISTART recovery level is started again.
Once the ISTART recovery level has been successfully completedrecovery has
been completed and the system is in normal operating modethe system operator
enters the appropriate MML command to initiate processing of the data collection
symptoms and transfer of the processed data collection symptoms to a magneto-
optical disk.
The magneto-optical disk containing the processed data collection symptoms is
used by the system manufacturer for analyzing and clearing the cause of the fault.
Operational sequence of data collection
Data collection consists of the following three processes:
Collecting data collection symptoms about the ISTART interruption.
Saving the collected data collection symptoms immediately to the system disks.
Editing the saved data collection symptoms into a readable format and outputting
them to magneto-optical disk.
These three processes are outlined in the following sections.
Collecting data collection symptoms about the ISTART interruption
The data collection symptoms which data collection collects can be divided into two
groups:
Standard symptoms.
Additional symptoms about the interruption (escalation) of NSTART/ISTART
recovery levels.
The standard symptoms include among other things:
Symptom package of the software fault that has occurred.
Logbook of the safeguarding monitor.
The additional symptoms about the interruption of the ISTART recovery level include
among other things:
Control and status data of central recovery.
Control and status data of recovery in the LTG.
Trace points of the interrupted ISTART recovery levels (trace-point history).
History of the recovery escalation before the interruption of ISTART.
Significant data of the operating system.
Operating states of the central units of the CP.
Saving the collected data collection symptoms immediately to the system disks
Data collection outputs the collected data collection symptoms immediately:
To the system file DC.V00 created on system disk 0.
To the system file DC.V00 created on system disk 1.
Data collection arises automatically when both DC.V files are created. In case further
ISTART interruptions arise and data collection therefore saves further data collection
symptoms, the DC.V files are labelled from DC.V02 to DC.V07. In this case all




25
Recovery Summary of Characteristics
Id:0900d8058008c5bc
DC.V<nn> files with even numbers are stored on the system disk 0, and all DC.V<nn>
files with uneven numbers are stored on the system disk 1.
The data collection symptom saving function is designed in such a manner that, during
a system recovery, 4x2 DC.V<nn> files are created (on both system disk 0 and system
disk 1). This means that when a running system recovery is interrupted 4 times, data
collection saves the data collection symptoms four times altogether on two DC.V<nn>
files, which are created on system disk 0 and system disk 1.
The labelling and consecutive numbering of the DC.V<nn> files is depicted in table Iden-
tification of system files created by data collection for saving data collection symptoms
(DC.V<nn>).
Should a software error occur while the system is in the basic call-processing operation,
and should this software fault further create an escalation in the basic call-processing
operation, then two further files DC.B00 and DC.B01 will be created on both system
disks. This labelling is illustrated in table Identification of system files created by data
collection for saving data collection symptoms (DC.B<nn>).
Editing the saved data collection symptoms into a readable format and outputting
them to magneto-optical disk
The data collection symptoms saved to the DC. files are in an unprocessed, non-
readable format as simple character strings. This unprocessed format of the symptoms
allows particularly quick data transfer from the CP memory to the system disks. This
quick transfer is important when the critical system state ISTART interruption occurs.
When the system is running in normal operation mode again, the data collection
symptoms are put into a readable format for analysis purposes and passed on to the
system manufacturer on a magneto-optical disk.
The exact procedures are described in the manual Saving of Error Symptoms.
Interruption during a
system recovery
File created on system
disk 0
File created on system
disk 1
1st interruption DC.V00 DC.V01
2nd interruption DC.V02 DC.V03
3rd interruption DC.V04 DC.V05
4th interruption DC.V06 DC.V07
Table 5 Identification of system files created by data collection for
saving data collection symptoms (DC.V<nn>)
Interruption during a
system recovery
(escalation from basic
operation in basic
operation)
File created on system
disk 0
File created on system
disk 1
1st interruption DC.B00 DC.B01
Table 6 Identification of system files created by data collection for
saving data collection symptoms (DC.B<nn>)


26
Recovery
Id:0900d8058008c5bc
Summary of Characteristics
2.10 Postprocessing
The task of recovery postprocessing is to initiate fault analysis for units that have failed
during the preceding system recovery.
These units are:
Central hardware units in the CP; for example, processors, input/output controls.
LTGs.
Recovery postprocessing is started by the operating system after a system recovery has
been completed, when the system resumes switching operations, and normal operation
starts up again.
Past history of the units which fail during a system recovery:
System recovery carries out a brief diagnosis of all units to be put into service. This brief
diagnosis tests the basic functions of the hardware. If a fault occurs, system recovery
aborts start-up of the unit, labels the unit as being defective and informs recovery post-
processing of the failure.
Special situation during basic call-processing:
No recovery postprocessing is performed in the basic call-processing mode.
Recovery postprocessing for failed units is performed when the system resumes normal
operation.
If an MB unit or an SN unit has failed during the preceding system recovery or fails
during ISTART1B, basic call-processing outputs an alarm message. The fault in the
failed MB or SN unit must be cleared manually.




27
Recovery Installation Recovery
Id:0900d8058008c5be
3 Installation Recovery
Installation recovery consists of the following two recovery levels:
INSTALL recovery.
initial start 2 forced recovery (ISTART2F).
Both recovery levels play an active and supporting role in starting up the system:
INSTALL Recovery is used for the initial start-up of a new network node.
Initial Start 2 Forced Recovery (ISTART2F) has several applications:
Initial start-up of a new network node and APS change (use of a new APS version)
in an existing network node.
3.1 INSTALL Recovery
Initial start-up of the system in a new network node runs through several phases. In the
bootstrap phase of the system, INSTALL recovery runs in addition to other programs.
The aim is the following:
To produce an APS generation,.
To test it.
To install it.
INSTALL recovery plays an active role in some of these processes, and in others sets
up the preconditions to allow the following tasks to be carried out:
Producing an APS generation from the APS master magneto-optical disk.
Expanding the database of the first APS generation to the customers requirements.
Incorporating network node data into the database.
Installing the first APS generation on the two system disks.
Running-up the first APS generation in order to test whether the system runs up
properly in the CP.
Copying the APS generation to a save magneto-optical disk.
INSTALL recovery requires inputs from the system operator: to begin with, the system
operator must start selected CP processes, so that INSTALL is able to carry out specific
basic functions such as input/output.
To summarize, INSTALL produces a serviceable APS generation specific to the network
node. This is the basis on which ISTART2F runs up the system and initiates switching
operation.
3.2 Initial Start 2 Forced Recovery (ISTART2F)
Runup with ISTART2F has the purpose of starting-up the switching operation.
The basis for the runup is the APS generation produced by INSTALL recovery, referred
to below as the current generation.
The reason for a runup can be:
Initial start-up of a new network node.
APS change (use of a new APS version) in an existing network node.


28
Recovery
Id:0900d8058008c5be
Installation Recovery
The runup processes, in which other programs also run in addition to ISTART2F, are
performed centrally from the BAP master. Roughly speaking, they can be subdivided as
follows:
Central recovery in the BAP master.
This encompasses loading the current generation into the common memory and
loading the program code into the local memory of the BAP master.
Local recovery in the remaining CP processors.
This encompasses loading the program code into the local memories of BAP spare
and CAP.
Runup in the call-processing periphery.
This encompasses synchronization with LTG/media control task (MCT), DLU and
CCNC, loading and activating LTG/MCT, DLU and CCNC.
Start of the system with start-up of switching operation.
This encompasses starting the processes in the BAP master, setting up the nailed-
up connections, and start-up of switching operation.
Figure Main recovery actions in the CP, LTG/MCT, MCP, RSU, DLU and CCNC with an
ISTART2F, ISTART2R, ISTART2G illustrates recovery actions which not only run under
an ISTART2F but also under system recovery levels ISTART2R and ISTART2G.




29
Recovery Installation Recovery
Id:0900d8058008c5be
Figure 1 Main recovery actions in the CP, LTG/MCT, MCP, RSU, DLU and CCNC
with an ISTART2F, ISTART2R, ISTART2G
CP
Runup in the call-processing periphery:
LTG/MCT/RSU/DLU
Runup in the CP: load program code and data
Synchronize with LTG/MCT, MCP, RSU, DLU and CCNC
Load
code and
data to
LTG/MCT
Load
code and
data to
MCP
Load code and
data to CCNC
Load
code and
data to
RSU
Activate
LTG/MCT
Activate
MCP
Activate CCNC
Activate
RSU
Set up SS7 nailed-
up connections
Activate signaling
links between
CCNC and SN
CCNC
Start call-processing/switching operations Start SS7 signal-
ing operations
Set up nailed-up connections for circuits
Start call-processing/switching operations (including recovery postprocessing)
Load DLU
data
Activate
DLU


30
Recovery
Id:0900d8058008c5b6
Central Recovery
4 Central Recovery
The SASDATS (saving of software data with statistics) function is used to document
software problems. The process that has detected a software problem can use
SASDATS (or SASDAT) to provide a documentary record at the same time as attempt-
ing to neutralize the fault. The majority of the remaining recovery actions in the CP
involve a CP recovery at level NSTART0. This is the lowest recovery level with the least
scope (after SASDATS). An NSTART0 causes practically no degradation of switching
operations: call-processing is able to continue without restriction.
The other NSTART levels of CP recovery are started in response to faults that affect
more than one CP process and in some cases require the reloading of data and program
code.
The ISTART levels of system recovery are started in cases where further recovery
action is required in the LTG, RSU, DLU and CCNC, in addition to recovery action in the
CP.
The escalation strategy for central recovery levels defines:
The conditions under which escalation to the next higher recovery level should take
place.
The conditions under which call-processing basic operation should be started.
4.1 CP Recovery
CP recovery is organized into new start levels (NSTART). The new start levels
(NSTART) are classified as follows:
NSTART0 is started in response to trouble which is restricted locally in the BAP
master to one process that is not relevant to call-processing.
NSTART1 is started in response to trouble affecting several user processes which
does not require the reloading of data or program code.
NSTART2 is started in response to trouble that requires reloading of semipermanent
data.
NSTART3 is started in response to trouble that requires reloading of semipermanent
data, certain transient data and, in exceptional cases, program code.
NSTART1B is started in exceptional cases only: when it is not possible to access
certain system files. NSTART1B leads to call-processing basic operation.
4.1.1 New Start Level 0 (NSTART0)
NSTART0 is started in response to trouble which is restricted locally in the BAP master
to one process that is not relevant to call-processing. Because the trouble has a minor
effect only, recovery actions are limited to processes in the BAP master that are not
relevant to call-processing.
The call processes CALLP that run in the BAP master, BAP spare and CAP are not
affected by the trouble.
Reason
An NSTART0 can be triggered by:
An SWSG call from a process that is not relevant to call-processing to deal with a
software error with local effects.
Hardware trouble, for instance alarm-induced master switchover (alarm switchover)
due to hardware failure in the BAP master.




31
Recovery Central Recovery
Id:0900d8058008c5b6
Escalation of SASDATS calls from processes that are not relevant to call-processing
(see section Escalation Strategy for Central Recovery Levels).
Start via MML command.
Recovery action
NSTART0 executes a new start of processes that are not relevant to call-processing in
the BAP master.
In the case of an alarm-induced master switchover the following occurs:
The old BAP spare becomes the new BAP master.
The old BAP master becomes BAP spare and then undergoes diagnosis in recovery
postprocessing.
Effects of recovery
Switching operations are not affected while NSTART0 is dealing with the trouble situa-
tion:
The call processes CALLP and the processes that are relevant to call-processing
continue to run without restriction and perform all required call-processing tasks
without delay.
Call-charge data registered in the CP memory are maintained.
Established calls are maintained.
Calls in the process of being set up or cleared down are further processed as
normal. Only an alarm-induced master switchover can cause a single call, that is
being set up or cleared down, to be terminated and lost.
Escalation
If NSTART0 is started several times, this can escalate to an NSTART1; see section
Escalation Strategy for Central Recovery Levels.
4.1.2 New Start Level 1 (NSTART1)
NSTART1 is started in response to trouble affecting the user processes in more than
one CP processor.
Reason
An NSTART1 can be triggered by:
SWSG call from user processes.
Certain hardware failures, for instance failure of a single processor during a specific
critical section of a program, such as the start/end treatment of a process.
Escalation of SASDATS calls from processes relevant to call-processing.
Escalation from recovery level NSTART0.
Start via MML command.
Recovery action
NSTART1 causes:
New start of all cyclic processes.
Resumption of switching operations.
Effects of recovery
While NSTART1 is dealing with the trouble situation:
Call-charge data registered in the CP memory are maintained.
Established calls are maintained.


32
Recovery
Id:0900d8058008c5b6
Central Recovery
Switching operations are briefly interrupted.
Where calls are in the process of being set up or cleared down, one call per BAP
and CAP may be terminated and lost. However, loss of calls only occurs in one par-
ticular situation: If a processor is processing a call message and this processing is
interrupted by a recovery.
All other calls in the process of being set up or cleared down are further processed
as normal.
Escalation
NSTART1 escalates to NSTART3 if within the defined supervision time all of the follow-
ing conditions are fulfilled:
Another NSTART0s or NSTART1s occurs.
The defined error threshold has been reached; see section Escalation Strategy for
Central Recovery Levels.
4.1.3 New Start Level 2 (NSTART2)
NSTART2 is started in cases where it is necessary to load semipermanent data in order
to neutralize a trouble situation.
Reason
An NSTART2 can be triggered by:
SWSG calls from user processes.
Certain hardware failures, for instance single processor outage during a particular
critical section of a program, such as an update procedure.
Start via MML command.
Recovery action
NSTART2 causes:
Reloading of semipermanent data.
Restarting of all cyclic processes.
Resumption of switching operations.
Effects of recovery
While NSTART2 is dealing with the trouble situation:
Call-charge data registered in the CP memory are maintained.
Established calls are maintained.
Switching operations are briefly interrupted.
Where calls are in the process of being set up or cleared down, one call per BAP
and CAP may be terminated and lost. However, loss of calls only occurs in one par-
ticular situation: If a processor is processing a call message and this processing is
interrupted by a recovery.
All other calls in the process of being set up or cleared down are further processed
as normal.
Escalation
NSTART2 escalates to NSTART3 or NSTART1B.
There is escalation to NSTART3 if the following conditions are fulfilled:
Other NSTARTs less than or equal to NSTART2 occur during the defined supervi-
sion time.
The defined error threshold is reached.




33
Recovery Central Recovery
Id:0900d8058008c5b6
Access to system files is possible.
There is escalation to NSTART1B is exceptional cases if:
Access to certain system files on the two system disks is not possible at that time, see
also section Escalation Strategy for Central Recovery Levels.
4.1.4 New Start Level 3 (NSTART3)
NSTART3 is started in cases where it is necessary to load program code, semiperma-
nent data and certain transient data in order to neutralize the trouble.
Reason
An NSTART3 can be triggered by:
SWSG calls from a software-safeguarding process.
Start via MML command.
Escalation from recovery level NSTART1 or NSTART2.
Recovery action
NSTART3 causes:
Reloading of semipermanent data.
Reloading of transient data declared with default values and dynamic initialization of
transient data not declared with default values (exceptions: the memory areas for
call-charge data and for transient call data).
Restarting of all cyclic processes.
Resumption of switching operations.
Reloading of program code for all resident processes (only in certain situations, such
as the sporadic failure of the last CMY).
Effects of recovery
While NSTART3 is dealing with the trouble situation:
Call-charge data registered in the CP memory are maintained.
Established calls are maintained.
Switching operations are briefly interrupted.
Where calls are in the process of being set up or cleared down, one call per BAP
and CAP may be terminated and lost. However, loss of calls only occurs in one par-
ticular situation: If a processor is processing a call message and this processing is
interrupted by a recovery.
All other calls in the process of being set up or cleared down are further processed
as normal.
Escalation
NSTART3 escalates to ISTART1 or NSTART1B.
There is escalation to ISTART1 if the following conditions are fulfilled:
Other NSTARTs less than or equal to NSTART3 occur during the defined supervi-
sion time.
The defined error threshold is reached.
Access to system files is possible. (Recovery level ISTART1 belongs to system
recovery; see section System Recovery.)
There is escalation to NSTART1B in exceptional cases if:
Access to certain system files on the two system disks is not possible at that time, see
also section Escalation Strategy for Central Recovery Levels.


34
Recovery
Id:0900d8058008c5b6
Central Recovery
4.1.5 New Start Level 1B (NSTART1B)
Recovery level NSTART1B is a set of strict and wide-ranging security measures
designed to enable the system to continue switching operations in exceptional situations
where certain trouble situations occur at the same time (access to certain system files
is not possible at that time).
NSTART1B leads to call-processing basic operation. In this mode, the systems perfor-
mance is as follows:
Basic call-processing functions are available.
Full call-processing capacity is available without restriction.
System operating functions are restricted to those actions that are necessary for
fault clearance.
Call charges are calculated on the basis of the lowest rate table.
Access to certain system files on the two system disks is not possible.
The return from call-processing basic operation to normal operation is initiated by
starting recovery level ISTART2 manually using the BOOT key.
Reason
An NSTART1B is triggered if the following conditions are fulfilled:
An NSTART2 or NSTART3 or ISTART1 has occurred.
Exceptional case occurs where access to certain system files on the two system.
Disks is not possible at that time. (Recovery level ISTART1 belongs to system
recovery; see section System Recovery.)
Recovery action
The recovery actions of NSTART1B correspond to those of an NSTART1 with the dif-
ference that the set of processes to be started is restricted to processes that are relevant
to call-processing. NSTART1B causes:
Restarting of cyclic processes that are relevant to call-processing.
Loading of the lowest rate table to the LTG.
Setting of call-processing basic operation.
Effects of recovery
While NSTART1B is setting basic operation:
Call-charge data registered in the CP memory are maintained.
Established calls are maintained.
Switching operations are briefly interrupted (setting of call-processing basic opera-
tion after completion of NSTART1B without errors).
Where calls are in the process of being set up or cleared down, one call per BAP
and CAP may be terminated and lost. However, loss of calls only occurs in one par-
ticular situation: If a processor is processing a call message and this processing is
interrupted by a recovery.
All other calls in the process of being set up or cleared down are further processed
as normal.
Access to certain system files is not possible at that time.




35
Recovery Central Recovery
Id:0900d8058008c5b6
Escalation
An NSTART1B involves special supervision conditions, which are described briefly on
the basis of the following three scenarios:
Scenario 1: No trouble during the supervision time.
If no other trouble is encountered during the supervision time of 60 s, call-processing
basic operation can be run without problems.
Scenario 2: Trouble during the supervision time.
If trouble is encountered during the supervision time, then
The 60-s supervision timer is restarted.
The error counter is incremented by 1.
If further trouble occurs, then
The 60-s supervision time is started yet again.
The error counter is incremented by 1 again, until the error threshold of 10 is
reached.
Scenario 3: NSTART1B escalates to ISTART2.
If the error counter reaches the value of 10, software error treatment triggers an
escalation to recovery level ISTART2.
4.2 System Recovery
The initial start recovery levels (ISTART) are classified as follows:
Initial Start Level 1 (ISTART1) is started in response to escalation from CP recovery
level NSTART3 or in response to certain hardware trouble situations in the CP.
Initial Start Level 1B (ISTART1B) leads to call-processing basic operation. This sit-
uation occurs when an ISTART1 was unable to neutralize a trouble event success-
fully.
Initial Start Level 2 (ISTART2) comprises a wider range of recovery action than
ISTART1, including formatting of CP memory, loading of LTGs. ISTART2 may result
from escalation from NSTART1B or ISTART1B. ISTART2 can also be started man-
ually: via MML command or using the BOOT key.
Initial Start Level 2R (ISTART2R) is a recovery level that can only be started by
means of MML command. An ISTART2R involves the conditional reloading of LTG,
RSU, DLU and CCNC.
Initial Start Level 2G (ISTART2G) corresponds to an ISTART2R with the following
differences:
The CP memory is formatted prior to loading.
A saved generation is loaded instead of the current generation, in other words
fallback to a previous software status of the network node.
4.2.1 Initial Start Level 1 (ISTART1)
ISTART1 is executed in cases where recovery action is required in LTG, RSU and DLU
in addition to the recovery action in the CP.
Reason
An ISTART1 can be triggered by:
SWSG calls from an operating-system process.
Escalation from NSTART3.
Certain hardware trouble situations, for example repeated failure of input/output
controls.
Start via MML command.


36
Recovery
Id:0900d8058008c5b6
Central Recovery
Recovery action in the CP
An ISTART1 causes:
Reloading of program code.
Reloading of semipermanent data.
Reloading of transient data declared with default values.
Dynamic initialization of transient data declared with non-default values (except:
call-charge memory area).
Restarting of all cyclic processes.
Resumption of switching operations.
Recovery action in the call-processing periphery
An ISTART1 causes:
Resetting and activation of
Message buffer
Switching network
Resetting and starting of recovery:
In the LTG an initial start 1in other words: restarting of the LTG without loading;
see section Recovery Level 2.2 (LTG Initial Start 2).
In the DLU an initial startin other words: The front-end LTG in B-function reloads
the semipermanent DLUdata and restarts the DLU system; see section Recovery
Level 2.2 (LTG Initial Start 2).
In the RSU a recovery level 2.1in other words: restarting the RSU without loading;
see section Recovery Level 2.1.
An ISTART1 initiated by the CP does not call for any recovery action in the CCNC.
Effects of recovery
While the ISTART1 is restoring system availability:
Call-charge data registered in the CP memory are maintained.
Switching operations are interrupted until the system has been booted successfully.
Established calls are cleared down.
Nailed-up connections are released, and set up again when the system has been
restarted successfully.
Any call requests arriving in the mean time are rejected by the LTGs.
Escalation
An ISTART1 escalates if the following conditions are fulfilled:
Other recoveries less than or equal to ISTART1 occur during the defined supervision
time.
The defined error threshold has been reached.
An escalation from ISTART1 leads to the setting of call-processing basic operation.
Call-processing basic operation is set in the following cases:
At recovery level ISTART1B if access to system files is possible.
At recovery level NSTART1B if access to certain system files on the two system
disks is not possible at that time (exceptional case); see also section Escalation
Strategy for Central Recovery Levels.




37
Recovery Central Recovery
Id:0900d8058008c5b6
4.2.2 Initial Start Level 1B (ISTART1B)
ISTART1B leads to the setting of call-processing basic operation. In this mode, the
systems performance is as follows:
Basic call-processing functions are available.
Call-processing capacity is available without restrictions.
System operating functions are restricted to those actions that are necessary for
fault clearance.
Call charges are calculated on the basis of the lowest rate table.
The return from call-processing basic operation to normal operation is initiated by
starting recovery level ISTART1 by MML command.
The recovery actions of ISTART1B correspond largely to those of ISTART1. The differ-
ence is that in an ISTART1B the set of cyclic processes to be restarted is limited to the
group of processes that are relevant to call-processing.
Reason
An ISTART1B can be triggered by:
An ISTART1 which was not executed successfully.
Recovery action in the CP
ISTART1B causes:
Reloading of program code.
Reloading of semipermanent data.
Reloading of transient data declared with default values.
Dynamic initialization of transient data declared with non-default values (except:
memory area for call-charge data).
Restarting of cyclic processes that are relevant to call-processing.
Loading of the lowest rate table to the LTGs.
Setting of call-processing basic operation.
Recovery action in the call-processing periphery
An ISTART1B causes:
Resetting and activation of
Message buffer
Switching network
Resetting and starting of recovery:
In the LTG an initial start 1in other words: restarting of the LTG without loading;
see section LTG Recovery.
In the DLU an initial startin other words: The front-end LTG in B-function reloads
the DLU data and restarts the DLU system; see section DLU Recovery.
In the RSU a recovery level 2.1in other words: restarting the RSU without loading;
see section Recovery Level 2.1.
An ISTART1B initiated by the CP does not call for any recovery action in the CCNC.
Effects of recovery
While the ISTART1B is setting basic operation:
Call-charge data registered in the CP memory are maintained.
Switching operations are interrupted (setting of call-processing basic operation after
successful completion of ISTART1B)
Established calls are cleared down.


38
Recovery
Id:0900d8058008c5b6
Central Recovery
Nailed-up connections are released, and set up again during call-processing basic
operation.
Call requests arriving in the mean time are rejected by the LTGs.
Escalation
An ISTART1B escalates to ISTART2 if the following conditions are fulfilled:
Other trouble (NSTART or ISTART1) is encountered during the defined supervision
time.
The defined error threshold has been reached; see section Escalation Strategy for
Central Recovery Levels.
4.2.3 Initial Start Level 2 (ISTART2)
The recovery level ISTART2 has a wider range of recovery action than ISTART1, includ-
ing:
Formatting of the CP memory prior to loading.
Selection of the generation to be loaded.
Loading of semipermanent data to the LTGs.
Loading of semipermanent data to the RSUs.
Selection of the system disk from which files are to be loaded.
Depending on the reason why the recovery was started, these actions may be defined
internally in the system or selected as options by the system operator:
If recovery is started using the BOOT key, the system operator can select which of
the additional actions are to be executed.
If recovery is started by MML command or by software error treatment, the system
defines which of the additional actions are to be executed. The system operator
cannot select any options (unlike start using the BOOT key).
Reason
An ISTART2 can be triggered by:
Manual start requested by the system operator using the BOOT key or an MML
command.
Automatic start requested by software error treatment as a result of escalation from
NSTART1B or ISTART1B to ISTART2.
Automatic start requested by hardware fault analysis after power outage.
Recovery action in the CP
The recovery actions of ISTART2 comprise:
Formatting of the CP memory.
Selection of the generation and, optionally, formatting of the CP memory.
Standard recovery action as executed under ISTART1.




39
Recovery Central Recovery
Id:0900d8058008c5b6
Formatting of CP memory.
The CP memory is formatted prior to loading. As a result, all call charges registered
in the CP are deleted.
Call charge registration continues on the basis of the last call-charge data saved to
the system disks. These call-charge data are automatically transferred to the CP
call-charge memory area.
Selection of generation and optional formatting of CP memory.
The options for combining selection of generation and optional formatting result
in the following three cases:
1st case:
Loading of current generation and formatting
This has an effect on the registration of call charges, as described above under For-
matting of CP memory.
2nd case:
Loading of current generation without formatting
The call-charges registered in the CP memory are maintained. No further action is
required with respect to call charges.
3rd case:
Loading of a saved generation, with or without formatting
If a saved generation is loaded instead of the current generation, this meansas in
an ISTART2Ga fallback to a previous software status. Consequentlyas in an
ISTART2Gadditional action is required such as:
updating the loaded generation
Additional
recovery action
during ISTART2
ISTART2 started by MML
command or by
software/hardware fault
analysis
ISTART2 started using
BOOT key
Formatting The CP memory is formatted
prior to loading.
Formatting of the CP memory
can be selected as an option.
Selection of
generation
No choice: The current
generation is loaded.
Either the current generation
or a saved generation can be
loaded, as an option.
Loading of
semipermanent data
to the LTGs
Semipermanent data are
loaded to the LTGs.
Semipermanent data are
loaded to the LTGs.
Loading of
semipermanent data
to the RSUs
Semipermanent data are
loaded to the RSUs.
Semipermanent data are
loaded to the RSUs.
Program code is not loaded to
the LTGs.
Program code can be loaded
to the LTGs, as an option.
Selection of system
disk
No choice: The leading system
disk is used for loading.
The system operator can
select which system disk is to
be used for loading.
Table 7 Recovery actions that are optional or fixed in an ISTART2,
depending on how the recovery was started


40
Recovery
Id:0900d8058008c5b6
Central Recovery
restoring the charge-meter counts
See ISTART2G and Maintenance Manual MMN:SW.
Standard recovery action.
Regardless of whether memory is formatted or not and regardless of which genera-
tion is loaded, ISTART2 performs the following standard recovery action:
Reloading of program code.
Reloading of semipermanent data.
Reloading of transient data declared with default values.
Dynamic initialization of transient data declared with non-default values. (The
memory area containing call-charge data is only initialized if the CP memory was
formatted prior to loading.)
Restarting of all cyclic processes.
Resumption of switching operations after completion of the recovery action in
the call-processing periphery.
Recovery action in the call-processing periphery
An ISTART2 causes:
Resetting and activation of
Message buffer
Switching network
Resetting and starting a peripheral recovery in LTG, RSU, DLU and CCNC:
In the LTG an initial start 1 or initial start 2in other words: reloading of semiperma-
nent LTG data and restarting of the LTGs; see section LTG Recovery. In the case
of an LTG with B-function, the CP also loads the semipermanent DLU data to the
LTGs. If recovery is started using the BOOT key, the system operator can also
select the option to reload program code to all LTGs. In this case, ISTART2 in the
LTG leads to an initial start 2; see section LTG Recovery.
In the DLU an initial start (only if the associated LTG is loaded)in other words: The
front-end LTGB reloads the semipermanent DLU data and restarts the DLU system;
see section DLU Recovery.
In the RSU a recovery level 2.1in other words: reloading the semipermanent RSU
data and restarting the RSU; see section RSU Recovery.
In the CCNC an initial start for the entire CCNCIn other words: reloading of
program code and semipermanent data to the CCNC, restarting of all CCNC units;
see section CCNC Recovery.
Effects of recovery
While the ISTART2 is restoring system availability:
Switching operations are interrupted until the system has been booted successfully.
Established calls are cleared down.
Nailed-up connections are released and set up again when the system has been
restarted successfully.
Any call requests arriving in the mean time are rejected by the LTGs.
Call-charge data registered in the CP are only maintained if the CP memory was not
formatted prior to loading. If the CP memory was formatted, the last call-charge data
status saved to the system disks serves as the basis for continuation of call-charge
registration.




41
Recovery Central Recovery
Id:0900d8058008c5b6
Escalation
An ISTART2 escalates to ISTART2G if the following conditions are fulfilled:
Other trouble less than or equal to ISTART2 occurs within the defined supervision
time.
The defined error threshold has been reached; see section Escalation Strategy for
Central Recovery Levels.
4.2.4 Initial Start Level 2R (ISTART2R)
The differences between an ISTART2R and an ISTART2 are:
Program code is loaded to the LTGs under certain conditions.
Program code is loaded to the RSUs under certain conditions.
The current generation is loaded.
Recovery can only be started by MML command.
There are no recovery options, unlike ISTART2.
Reason
An ISTART2R can be triggered by:
MML command.
Escalation of a previous ISTART with total failure of the periphery.
Recovery action in the CP
An ISTART2R causes:
Reloading of program code.
Reloading of semipermanent data.
Reloading of transient data declared with default values.
Dynamic initialization of transient data declared with non-default values (except: the
memory area containing call-charge data).
Restarting of all cyclic processes.
Resumption of switching operations after completion of recovery action in the call-
processing periphery.
Recovery action in the call-processing periphery
An ISTART2R causes:
Resetting and activation of:
Message buffer
Switching network
Resetting and start of peripheral recovery in LTG, RSU, DLU and CCNC:
In the LTG an initial start 2in other words: conditional reloading of LTG program
code, semipermanent LTG data and restarting of the LTGs; see section LTG Recov-
ery. For an LTG in B-function with DLU, the CP also loads the semipermanent DLU
data to the LTG.
In the DLU an initial startin other words: The front-end LTG in B-function reloads
the semipermanent DLU data and restarts the DLU system; see section DLU Recov-
ery.
In the RSU a recovery level 2.2in other words: reloading of RSU program code
and semipermanent RSU data and restarting the RSU; see section RSU Recovery.
In the CCNC an initial start for the entire CCNCIn other words: reloading of
program code and semipermanent data to the CCNC, restarting all CCNC units; see
section CCNC Recovery.


42
Recovery
Id:0900d8058008c5b6
Central Recovery
Effects of recovery
While the ISTART2R is restoring system availability:
Switching operations are interrupted until the system has been successfully booted.
Established calls are cleared down.
Nailed-up connections are released and set up again after the system has been suc-
cessfully started.
Any call requests arriving in the mean time are rejected by the LTGs.
Call-charge data registered in the CP are maintained.
Escalation
An ISTART2R escalates to ISTART2G if the following conditions are fulfilled:
Other trouble less than or equal to ISTART2 is encountered within the defined
supervision time.
The defined error threshold has been reached, see section Escalation Strategy for
Central Recovery Levels.
4.2.5 Initial Start Level 2G (ISTART2G)
The actions and effects of an ISTART2G recovery correspond to those of an ISTART2R
with the main difference that:
The CP memory is formatted prior to loading; consequently, call charges registered
in the CP are deleted.
A saved generation is loaded instead of the current generation.
The loading and subsequent use of a saved generation means fallback to a previous
software status of the network node.
Fallback to a saved generation is used when recovery level ISTART2 or ISTART2R with
the current generation could not be executed successfully in the CP. This situation leads
to an escalation to ISTART2G: the ISTART2G starts automatically, selects the most
recent saved generation from the generation list and attempts to start up the system
using this saved generation.
The generation list contains all of the generations that can be used in the network node.
The first position in the list is the current generation. This is followed by the saved gen-
erations, sorted according to the date of saving. The different types of saved generation
are:
Backup generation, produced during a routine save.
Golden generation, produced during a quarterly save; The golden generation has
been start-tested and represents the saved software status of the network node.
Reason
An ISTART2G can be triggered automatically or manually:
Automatic start through escalation from ISTART2 or ISTART2R.
Manual start via MML command.
Recovery action in the CP
As for ISTART2R.
And in addition:
Formatting of CP memory prior to loading.
Dynamic initialization of the call-charge memory area.




43
Recovery Central Recovery
Id:0900d8058008c5b6
Recovery action in the call-processing periphery
As for ISTART2R.
Effects of recovery
As for ISTART2R.
Except: call charges, see section Updating the loaded saved generation to the current
software status of the network node.
Escalation
If startup in the CP with the most recent saved generation is not successful, then
ISTART2G selects the next saved generation. This is usually the start-tested golden
generation. If, unexpectedly, startup is not successful with the golden generation either,
selection from the generation list starts at the beginning again with the current genera-
tion.
Startup attempts continue until startup is successful with one of the generations or the
system operator intervenes: terminating the ISTART2G and taking the action described
in the Maintenance Manual for Software.
Updating the loaded saved generation to the current software status of the
network node
After a successful startup, the loaded saved generation must be updated. This involves:
Incorporating LOG files in the loaded saved generation. LOG files contain the mod-
ifications representing the difference between the saved generation and the current
generation.
Restoring the charge-meter counts in the CP.
The steps to be taken are described in the Maintenance Manual for Software.
4.3 Escalation Strategy for Central Recovery Levels
CP processes have a choice of two methods of fault neutralization, depending on the
severity and side-effects of the software error:
Independent software error neutralization.
The CP process starts its own error neutralization routine. The CP process instructs
software error treatment to collect error symptoms, by sending a SASDATS call; see
the section on SASDATS (saving of software data with statistics) in this section.
Use of recovery for software error neutralization.
The CP process sends an SWSG call to software error treatment, which then starts
an NSTART recovery level.
Escalation and escalation parameters
Regardless of whether the process which detected the error sends an SWSG call or an
SASDATS call, a supervision timer is started in both cases: software error treatment
watches for the occurrence of further SWSG or SASDATS calls during the supervision
time.
The duration of the supervision time varies according to the selected recovery level. In
addition, an error threshold is associated with each supervision time; see table Central
recovery levels and their escalation parameters.


44
Recovery
Id:0900d8058008c5b6
Central Recovery
Software error treatment uses the supervision time and the error threshold to decide
whether to initiate escalation to a higher recovery level if further trouble should be
detected.
If further errors (less than or equal to the currently running recovery level) occur during
a defined supervision time to such an extent that the defined error threshold is reached,
then software error treatment initiates escalation to the next higher recovery level.
Table Central recovery levels and their escalation parameters shows the escalation
parameters for each recovery level: the duration of the supervision time, the error
threshold and the next higher recovery level.
SASDATS (saving of software data with statistics)
A SASDATS call is not strictly speaking a recovery, and is only included here in connec-
tion with the escalation strategy.
In response to a SASDATS call, software error treatment simply collects error symptoms
and watches to see whether any more SASDATS calls are received within the defined
supervision time.
SASDATS calls do not have any effect on switching operations or on data. The CP
process that detected the error neutralizes the software itself and continues to run after
the SASDATS call.
SASDATS or
recovery level
Supervision
time
Error
threshold
Escalation to
SASDATS 80 s 40 NSTART0
NSTART0 10 min 3 NSTART1
NSTART1 10 min 2 NSTART3
NSTART2 10 min 2 NSTART3
or
NSTART1B
1)
NSTART3 10 min 2 ISTART1
or
NSTART1B
1)
ISTART1 10 min 2 ISTART1B
or
NSTART1B
1)
ISTART1B 10 min 2 ISTART2
NSTART1B 60 s 10 ISTART2
ISTART2 10 min
2)
2 ISTART2G
(startup with backup
generation)
Table 8 Central recovery levels and their escalation parameters




45
Recovery Central Recovery
Id:0900d8058008c5b6
ISTART2G
(startup with backup
generation)
10 min
2)
2 ISTART2G
(startup with golden
generation)
ISTART2G
(startup with golden
generation)
10 min
2)
1 ISTART2G
3)
(startup with next higher
generation)
Notes relating to table Central recovery levels and their escalation parameters:
1) Escalation to NSTART1B is independent of the error threshold. It is executed in
exceptional cases where access to certain system files is not possible at the time of
recovery.
2) This supervision time is variable: minimum 10 minutes, maximum until the loading
of the entire call-processing periphery has been completed.
3) If startup with the golden generation in the CP is not successful, then ISTART2G
initiates another startup attempt with the next generation in the generation list; see
section Initial Start Level 2G (ISTART2G).
SASDATS or
recovery level
Supervision
time
Error
threshold
Escalation to
Table 8 Central recovery levels and their escalation parameters (Cont.)


46
Recovery
Id:0900d8058008c5b8
Recovery in the Call-processing Periphery
5 Recovery in the Call-processing Periphery
A recovery in the call-processing periphery can take place in the following subsystems:
Remote switching unit (RSU).
Line/trunk group (LTG).
Digital line unit (DLU).
Media control platform (MCP).
Media control task (MCT).
Common channel signaling network control (CCNC).
The call-processing subsystems have the following points in common with respect to
recovery (exceptions are specified explicitly):
Software of the subsystems
The software for the subsystems is loaded from the coordination processor (CP) or from
the packet control unit (PCU) serving as DHCP server in case of the MCP. In the case
of MCP, MCT, LTG, (possibly) RSU and CCNC, program code and data are loaded; only
data are loaded to the DLU.
Occurrence of a fault
If a fault occurs in a subsystem during operation (software faults and also certain
hardware faults), a recovery is initiated in this subsystem.
Clearance of the fault
The recovery that runs in the subsystem clears the fault by means of recovery actions.
In serious cases, this means reloading program code and data.
Selection of the suitable recovery actions
Each recovery is divided into recovery levels. Depending on the extent and scope of the
fault, the fault-detecting program selects a recovery level which, using specific recovery
actions, immediately restores the subsystem to full performance.
Supervisory measuresescalation
The operational sequence of the recovery level started is monitored by the respective
safeguarding programs. If the recovery level started is not successful, the result is esca-
lation of the recovery level: safeguarding programs start the next-highest recovery level,
which initiates more extensive recovery actions in the subsystem.
Effects of recovery on the switching operation of a subsystem
The majority of recovery cases that can occur in a subsystem initiate the lowest level of
a recovery, the resultant recovery actions do not lead to any noticeable effects on
switching operation as far as the subscriber is concerned.
Starting options
The start of a recovery in the call-processing periphery can be initiated by the following:
The subsystem itself (internally).
The CP.
Recovery is initiated by the subsystem itself (internally) in the following cases:
A fault-detecting (safeguarding) program which calls a recovery level in the sub-
system.
The safeguarding programs of the subsystem when escalation of the recovery levels
takes place.




47
Recovery Recovery in the Call-processing Periphery
Id:0900d8058008c5b8
Recovery is initiated from the CP by means of system recovery which initiates a
recovery in the call-processing periphery.
Additional dependability through redundant hardware units
In the CCNC, there is redundancy of the functional units for basic tasks. If a fault occurs
in one of the redundant functional units, the second functional unit takes over the tasks
to be performed.
A DLU consists of two DLU systems which are connected to two different LTGs or to
four different LTGs (DLUG) for load sharing reasons. If a fault occurs in either one of the
LTGs, then only one of the DLU systems is affected. Thus only 50% of the total DLU at
most.
5.1 LTG Recovery
From a functional point of view, LTG recovery is organized in three recovery levels and
an escalation strategy. The LTG restart is briefly explained at the end of this section.
The main recovery effects of the LTG recovery levels are summarized in the brief
overview presented in table Brief overview of the main recovery effects of the LTG
recovery levels.
Recovery
level
Reason Effects of recovery on the
switching operation in the
LTG
With an LTG operating in
the B-function:
effects of recovery on the
DLU system
LTG new
start
LTG-
internal
reason, for
example,
software
fault
LTG-internal switching
operation is interrupted
briefly:
- existing calls are
maintained
- the call-charge units that
have accumulated in the
LTG for existing calls are not
lost
- calls in the process of
being set up or cleared down
are aborted
If conditions are
unfavorable, an LTG new
start 1 initiates a DLU
recovery initial start in the
DLU system connected.
The second DLU system
takes over the DLU call-
processing functions; see
section DLU Recovery.
Table 9 Brief overview of the main recovery effects of the LTG recovery levels


48
Recovery
Id:0900d8058008c5b8
Recovery in the Call-processing Periphery
5.1.1 Recovery Level 1 (LTG New Start)
LTG new start becomes active when a software fault occurs which affects several LTG
user programs or when LTG restarts escalate; for details on LTG restart, see the esca-
lation strategy described in section Escalation Strategy.
A software fault can, for example, take the form of an internal chaining error in the tran-
sient data of the LTG executive control program.
Reason
LTG new start can be initiated by the following:
Call from LTG user programs.
Call from LTG safeguarding programs.
Escalation of the LTG restarts.
Recovery actions
LTG new start initiates:
Setting-up the preconditions for resumption of switching operation.
Resumption of switching operation in the LTG by starting the executive control
program.
In an LTG operating in the B-function, resumption of communication with the DLU
system connected.
Effects of recovery
While LTG new start clears the fault situation:
The existing calls are maintained.
The call-charge units that have accumulated in the LTG for existing calls are
retained.
LTG initial
start 1,
LTG initial
start 2
LTG-
internal
reason,
LTG-
external
reason, for
example
system
recovery
which runs
in the CP
initiates
recovery in
the LTG
LTG-internal switching
operation is interrupted until
the recovery level has been
completed:
- calls in the process of
being set up or cleared down
are aborted
- existing calls are cleared
down
- call-charge units that have
accumulated in the LTG for
existing calls are lost (at
most) for the time specified
for the intermediate call
duration
An LTG initial start 1 or 2
initiates a DLU recovery
initial start in the DLU
system connected.
The second DLU system
takes over the DLU call-
processing functions; see
section DLU Recovery.
Recovery
level
Reason Effects of recovery on the
switching operation in the
LTG
With an LTG operating in
the B-function:
effects of recovery on the
DLU system
Table 9 Brief overview of the main recovery effects of the LTG recovery levels




49
Recovery Recovery in the Call-processing Periphery
Id:0900d8058008c5b8
LTG-internal switching operation is interrupted briefly.
The calls in the process of being set up or cleared down are aborted and lost.
Recovery effect of an LTG operating in the B-function on the DLU system con-
nected
LTG new start can lead to a DLU recovery of initial start recovery level in the DLU
system connected. This occurs when critical communication is taking place between the
LTG operating in the B-function and the DLU system connected and this communication
must now be aborted due to the LTG new start. In this case, the LTG operating in the B-
function starts a DLU recovery of initial start recovery level in the DLU system con-
nected; see section DLU Recovery.
Within the LTG operating in the B-function, DLU maintenance initiates DLU recovery
and monitors its course in the DLU system connected. While the DLU recovery started
by the upstream LTG operating in the B-function is running in the DLU system con-
nected, the second (redundant) DLU system takes over the DLU call-processing func-
tions which arise.
Once the DLU recovery has been completed, the LTG operating in the B-function and
the DLU system connected resume communication and switching operation.
Recovery escalation
A number of LTG new starts can escalate to an LTG initial start 1; see section Escalation
Strategy.
5.1.2 Recovery Level 2.1 (LTG Initial Start 1)
The reason for an LTG initial start 1 can be LTG-internal or LTG-external. An LTG-
internal reason may arise, for example, if LTG safeguarding programs detect that the
checksum of the semipermanent LTG data no longer agree.
Reason
LTG initial start 1 can be initiated by the following:
Call from LTG safeguarding programs.
Escalation of LTG new start.
System recovery which runs in the CP under recovery level ISTART1 or ISTART2.
Recovery actions
LTG initial start 1 initiates:
Resetting of the LTG.
Reloading of the semipermanent data from the CP (no reloading in case of
ISTART1).
Setting-up the preconditions for resumption of switching operation in the LTG.
Resumption of switching operation in the LTG by starting the executive control
program.
In an LTG operating in the B-function, resumption of communication with the DLU
system connected.
Effects of recovery
While LTG initial start 1 clears the fault:
LTG-internal switching operation is interrupted.
The calls in the process of being set up or cleared down are aborted and lost.
The existing calls are cleared down.


50
Recovery
Id:0900d8058008c5b8
Recovery in the Call-processing Periphery
The nailed-up connections are likewise cleared down and then set up again in the
course of resumption of switching operation.
The call-charge units that have accumulated in the LTG for existing calls are lost; at
most for the time specified for the intermediate call duration.
Recovery effect of an LTG operating in the B-function on the DLU system con-
nected
The LTG operating in the B-function starts a DLU recovery initial start in the DLU system
connected. While the DLU recovery started by the upstream LTG operating in the B-
function is running in the DLU system connected, the second (redundant) DLU system
takes over the DLU call-processing functions which arise.
A DLU recovery (for the DLU system connected) is initiated within the LTG operating in
the B-function by the DLU maintenance and its course is monitored. Once the DLU
recovery has been completed, the LTG operating in the B-function and the DLU system
connected restore communication.
Recovery escalation
A number of LTG initial start 1 can escalate to an LTG initial start 2; see section Esca-
lation Strategy.
5.1.3 Recovery Level 2.2 (LTG Initial Start 2)
LTG initial start 2 is the highest recovery level within the LTG.
LTG initial start 2 is the same as an LTG initial start 1, but with an additional recovery
action which takes the form of unconditional or conditional reloading of LTG program
code:
Unconditional loading of LTG program code takes place in conjunction with a system
recovery of recovery level ISTART2F.
Conditional reloading of LTG program code takes place if the LTG initial start 2 is
initiated (LTG-)internally as follows:
By escalation of an LTG initial start 1.
In conjunction with a system recovery of recovery level ISTART2R or
ISTART2G.
In conjunction with an ISTART2 if reloading of LTG program code is specified.
(In the case of an ISTART2 that has been started via the BOOT key, the system
operator is able to specify that all LTGs are to be reloaded with program code;
see ISTART2 in section System Recovery.)
In these cases, the LTG initial start 2 first of all checks the checksum of the code
segments. If a checksum is correct, the code segment in question is not loaded. If
not, the code segment is reloaded from the CP.
Reason
LTG initial start 2 can be initiated by the following:
Escalation of LTG initial start 1.
System recovery in recovery levels ISTART2R and ISTART2G or with an
ISTART2R (when reloading of LTG program code is specified).
Recovery actions
LTG initial start 2 initiates:
Resetting of the LTG.




51
Recovery Recovery in the Call-processing Periphery
Id:0900d8058008c5b8
Reloading of program code from the CP (unconditional or conditional reloading, as
described above).
Reloading of the semipermanent data from the CP.
Setting-up the preconditions for resumption of switching operation in the LTG.
Resumption of switching operation in the LTG by starting the executive control
program.
In an LTG operating in the B-function, resumption of communication with the DLU
system connected.
Effects of recovery
As for LTG initial start 1.
Recovery effect of an LTG operating in the B-function on the DLU system con-
nected
As for LTG initial start 1.
Recovery escalation
No further escalation level.
If recovery level LTG initial start 2 is not completed successfully (for example, LTG
failure during recovery), CP safeguarding programs make two further attempts to return
the LTG to service.
If the LTG causes three LTG initial starts 2 within a period of 60 minutes, CP safeguard-
ing programs reconfigure this LTG to the operating state unavailable. The subsequent
actions are carried out by the maintenance personnel.
5.1.4 Escalation Strategy
The LTG programs can choose between two error-clearance optionsdepending on
the severity and extent of the software errorfor the purpose of clearing errors:
Clearing software errors themselves.
The error-detecting LTG program starts a fault-clearance routine. The error-detect-
ing LTG program passes on the task of saving the error symptoms (entering them
into the LTG error notebook) to the LTG safeguarding programs by means of the
restart call; see section LTG restart below.
Having software errors cleared by means of recovery.
The error-detecting LTG program sends a (recovery) call to the LTG safeguarding
programs, naming a recovery level which is started by the LTG safeguarding
programs for the purpose of clearing the software error.
Escalationescalation parameters
Irrespective of whether the error-detecting LTG program starts a restart call or a
recovery call, a supervision period begins in both cases: the LTG safeguarding
programs monitor whether any further restart or recovery calls arrive during the super-
vision period.
The duration of the supervision period depends on the recovery level started. In addition,
an error threshold limit is specified for each supervision period; see table LTG recovery
levels and their escalation parameters.
On the basis of this supervision period and the error threshold limit, the LTG safeguard-
ing programs decide, when further errors occur, whether an escalation to a higher
recovery level should be initiated or not.


52
Recovery
Id:0900d8058008c5b8
Recovery in the Call-processing Periphery
Escalation takes place if the following conditions are fulfilled:
Further errors occur while the recovery level is running or during the subsequent
supervision period.
The error threshold limit is reached.
Table LTG recovery levels and their escalation parameters contains the escalation
parameters for each recovery level. These are: the duration of the supervision period,
the error threshold limit, and the next-highest recovery level.
LTG restart
The LTG restart is not a case of recovery and is explained here purely in the context of
the escalation strategy.
The LTG restart is called if LTG user programs or LTG safeguarding programs detect a
software error which they are able to clear themselves. The software error can, for
example, take the form of an inconsistency in the data. The error symptoms are written
into the LTG error notebook.
A restart call has no effect on either LTG switching operation or data.
After the restart call, the error-detecting LTG program continues its run. No further
actions are required.
5.2 DLU Recovery
To begin with, we will briefly explain the mode of operation of a DLU, to allow us to gain
an understanding of both DLU Recovery and the functional organization of the DLU.
The DLU splits up switching traffic (to the LTG operating in the B-function) between two
internal DLU systems (the X path and the Y path) in order to share the load. These
two DLU systems operate independently of each other; they are connected to up to four
different LTG operating in the B-functions for safeguarding reasons. If a recovery takes
place in one of the DLU systems, the second DLU system continues operating without
any problem, processing all the traffic that occurs alone by means of its own LTG oper-
ating in the B-function.
From a functional point of view, DLU recovery is organized in two recovery levels and
an escalation strategy.
LTG restart or
LTG recovery level
Supervis
ion
time
Error
threshold
limit
Escalation to
LTG restart 5 min 10 LTG recovery level 1 (new start)
LTG recovery level 1
(new start)
5 min 5 LTG recovery level 2.1 (initial
start 1)
LTG recovery level 2.1
(initial start 1)
60 min 3 LTG recovery level 2.2 (initial
start 2)
LTG recovery level 2.2
(initial start 2)
60 min 3 See recovery escalation of LTG
initial start 2 in section Recovery
Level 2.2 (LTG Initial Start 2)
Table 10 LTG recovery levels and their escalation parameters




53
Recovery Recovery in the Call-processing Periphery
Id:0900d8058008c5b8
5.2.1 Recovery Level New Start
DLU new start becomes active if safeguarding programs of the DLU system detect a
software error which affects a call that is in the process of being set up or cleared down.
The software error can, for example, take the form of address inconsistencies in memory
areas used on a temporary basis.
Reason
DLU new start can be initiated by the following:
Call from the DLU safeguarding programs.
Escalation of the DLU restarts; for details on DLU restart, see the DLU recovery
escalation strategy described in section Escalation Strategy.
Recovery actions
DLU programs are able to clear the software error themselves. No further actions are
necessary.
Effects of recovery:
Those calls in the process of being set up or cleared down are lost.
No further effects; DLU switching operation continues without any problems.
Recovery escalation
A number of DLU new starts can escalate to a DLU initial start; see DLU escalation strat-
egy.
5.2.2 Recovery Level Initial Start
Initial start is the highest recovery level in a DLU system. A DLU initial start recovery is
started and its course is monitored by the upstream LTG operating in the B-function (and
within the LTG operating in the B-function by the DLU maintenance).
Reason
DLU initial start recovery can be initiated by the following:
Escalation of the DLU new starts in the DLU system.
LTG initial start recovery which runs in the upstream LTG operating in the B-function
under recovery level 2.1 or 2.2; in unfavorable situations, an LTG new start recovery
can also cause initiation.
Recovery actions
DLU initial start recovery carries out the following actions in the DLU system affected:
Resetting of the controller of the DLU system.
Reloading of the semipermanent DLU data (from the LTG database) into the DLU
system.
Re-synchronization of the digital trunks between the DLU system and the LTG oper-
ating in the B-function.
New start of the controller of the DLU system.
Resumption of switching operation in the DLU system affected.
Effects of recovery
A DLU initial start recovery has the following effects on the DLU system affected:
The calls that are in the process of being set up or cleared down in the DLU system
are aborted and lost.


54
Recovery
Id:0900d8058008c5b8
Recovery in the Call-processing Periphery
The existing calls that are running over the DLU system are cleared down.
The nailed-up connections that are running over the DLU system are likewise
cleared down and then set up again in the course of resumption of switching opera-
tion.
While a DLU initial start recovery is running in a DLU system, the second DLU system
takes over the DLU call-processing functions that arise.
Recovery escalation
The DLU maintenance (in the LTG operating in the B-function) monitors the course of
the recovery actions in the DLU system. If a fault occurs during the initial start, the DLU
maintenance undertakes a maximum of two further attempts to execute an initial start in
the DLU system. If these two starts are also unsuccessful, the DLU system is reconfig-
ured to disturbed. The subsequent actions are carried out by the maintenance person-
nel.
5.2.3 Escalation Strategy
The DLU programs can choose between two optionsdepending on the severity and
extent of the software errorfor the purpose of error clearance:
Clearing software error themselves.
The error-detecting DLU program starts an error-clearance routine. The error-
detecting DLU program passes on the task of saving the fault symptoms (entering
them into the DLU error notebook) to the DLU safeguarding programs by means of
the restart call; see section entitled DLU restart below.
Having software error cleared by means of recovery.
The error-detecting DLU program sends a recovery call to the DLU safeguarding
programs, which in turn start a DLU recovery level for the purpose of clearing the
software fault.
Escalationescalation parameters
Irrespective of whether the error-detecting DLU program issues a restart call or a
recovery call, a supervision period begins in both cases: the DLU safeguarding
programs and the DLU maintenance of the LTG operating in the B-function monitor
whether any further restart or recovery calls arrive during the supervision period.
The duration of the supervision period depends on the recovery level started. In addition,
an error threshold limit is specified for each supervision period; see table DLU recovery
levels and their escalation parameters.
On the basis of this supervision period and the error threshold limit, the DLU safeguard-
ing programs decide, when further faults occur, whether an escalation to a higher
recovery level should be initiated or not.
DLU restart or DLU
recovery level
Supervisio
n time
Error
threshold
Escalation to
DLU restart 60 min 10 DLU new start recovery
DLU new start recovery 30 min 2 DLU initial start recovery
DLU initial start recovery none
1)
Table 11 DLU recovery levels and their escalation parameters




55
Recovery Recovery in the Call-processing Periphery
Id:0900d8058008c5b8
Escalation takes place if the following situation arises:
further errors occur while the recovery level is running or during the subsequent
supervision period and
the error threshold limit is reached
Table DLU recovery levels and their escalation parameters contains the escalation
parameters for each recovery level. These are: the duration of the supervision period,
the error threshold limit, and the next-highest recovery level.
DLU restart
The DLU restart is not a case of recovery and is explained here purely in the context of
the escalation strategy.
The DLU restart is called if DLU programs detect a software fault which they are able to
clear themselves. The software error can, for example, be an inconsistency in the data.
A restart call has no effect on either DLU switching operation or data.
After the restart call, the fault-detecting DLU program continues its run. No further
actions are required.
5.3 CCNC Recovery
To begin with, we will briefly explain the mode of operation and structure of a CCNC, to
allow us to gain an understanding of both CCNC recovery and the functional organiza-
tion of a CCNC.
Operating mode of the CCNC
Operation of a CCNC system is safeguarded by two processors in standby mode. These
two CCNC processors are referred to below by their abbreviated names: CCNP0 and
CCNP1. One of the two CCNPs is active (CCNP0 or CCNP1) and carries out the SS7
signaling tasks which arise in switching operation. The other CCNP is in the meantime
on standby and is updated at periodic intervals by the active CCNP.
If a serious error occurs in the active CCNP, CP safeguarding programs switch over to
the CCNP that is on standby, which now becomes the active CCNP and is immediately
able to continue carrying out the SS7 signaling tasks.
Structure of the CCNC
Within the CCNC, a maximum of 32 signaling link terminal controls (SILTCs) are con-
nected to the two CCNPs. For security reasons, each SILTC is linked to both CCNPs.
Furthermore, up to 8 SILTDs (signaling link terminals digital) can be connected to each
SILTC. One SILTC and the SILTDs connected together make up an SILT group.
Software of the CCNC
The two CCNPs contain program code and semipermanent data, whereas the SILTCs
and SILTDs contain only semipermanent data.
Notes relating to table DLU recovery levels and their escalation parameters:
1)
DLU maintenance in the LTGB makes a maximum of two further attempts to start a
DLU initial start. If both attempts are unsuccessful, the DLU system is configured to
disturbed. Further actions are carried out by the maintenance personnel.


56
Recovery
Id:0900d8058008c5b8
Recovery in the Call-processing Periphery
Recovery in the CCNC
A recovery can run in the following CCNC units:
SILTD signaling link terminal digital.
SILT group encompasses the SILTC (signaling link terminal control) and the SILTDs
connected to it.
CCNP0, CCNP1, common channel signaling network processors.
From a functional point of view, CCNC recovery is organized into various recovery levels
and an escalation strategy.
5.3.1 Recovery Level 2, Initial Start for an SILTD, Initial Start for an
SILT Group
The two CCNC initial starts differ only slightly; for simplicity, they are described jointly
below.
An initial start for an SILTD or SILT group becomes active if CCNC programs are not
able to clear the fault themselves or if the CCNC restarts escalate; for details on CCNC
restart, see section Escalation Strategy.
Reason
Any of the CCNC initial start levels can be initiated by the following:
CCNC user programs.
CCNC safeguarding programs.
Escalation of the restarts for SILTD or for SILTC.
Recovery actions with an initial start for an SILTD
The initial start activates:
Initialization of the SILTD.
Reloading of the semipermanent data for the SILTD from the active CCNP.
Checking of the digital signaling links; any faulty signaling links are activated.
Recovery actions with an initial start for an SILT group
The initial start activates:
Initialization of the SILTC.
Initialization of all SILTDs (of the SILT group).
Reloading of the semipermanent data for the SILTC from the active CCNP.
Reloading of the semipermanent data for all SILTDs (of the SILT group) from the
active CCNP.
Checking of the digital signaling links; any faulty signaling links are activated.
Effects of recovery
MSUs (message signal units) can be lost when an initial start (for SILTD or SILT group)
is run.
SS7 signaling is not impaired by the initial starts:
During the initial start for an SILTD, the active CCNP selects one of the alternative
SILTDs.
During an initial start for an SILT group, the active CCNP handles signaling over
other SILT groups.




57
Recovery Recovery in the Call-processing Periphery
Id:0900d8058008c5b8
Recovery escalation of the initial starts for an SILTD
The CCNC safeguarding programs monitor the course of the recovery actions. If an
error occurs, CP safeguarding programs reconfigure the SILTD to operating state
unavailable. This does not disturb SS7 signaling traffic. The active CCNP switches to
one of the specified alternative SILTDs. Attempts at loading are carried out at periodic
intervals in the SILTD which is in the operating state unavailable. If these prove suc-
cessful, the unit is configured back to the standby state.
Recovery escalation of the initial starts for an SILT group
The CCNC safeguarding programs monitor the course of the recovery actions. If a fault
occurs, CCNC safeguarding programs reconfigure the SILTC to operating state
unavailable and the SILTDs connected to the SILTC to operating state not accessi-
ble. The subsequent actions for the SILT group are initiated by the maintenance per-
sonnel.
The SS7 signaling traffic is not disturbed either. The active CCNP handles signaling over
the specified alternative routes.
5.3.2 Recovery Level 2, Initial Start for a CCNP
The initial start for a CCNP is a hard switchover from the active CCNP to the standby
CCNP.
Hard switchover takes place in the following cases:
An error occurs, which CCNC safeguarding programs are not able to clear them-
selves.
The restarts of a CCNP escalate to an initial start.
The initial start for a CCNP is initiated by the CCNC safeguarding programs, and started
and monitored from the CP by the CP safeguarding programs.
Reason
Initial start for a CCNP can be initiated by the following:
CCNC user programs.
CCNC safeguarding programs.
CP safeguarding programs which monitor the CCNC.
Escalation of the restarts for a CCNP; for details on CCNC restart, see section Esca-
lation Strategy.
Recovery actions
The recovery actions take effect in both CCNPs:
In the previously active CCNP that is faulty and now becomes the standby CCNP.
In the previous standby CCNP that now becomes the new, active CCNP.
Recovery actions for the new, active CCNP (previous standby CCNP)
The CP safeguarding programs reconfigure the previous standby CCNP to active.
The new, active CCNP is immediately able to take over the SS7 signaling traffic,
because both CCNPs always contain the same current data.
Recovery actions for the faulty CCNP (previously active CCNP)
The CP safeguarding programs initiate the following actions:
Resetting and initialization of the faulty CCNP.
Reloading of program code and data into the initialized CCNP (from the CP).


58
Recovery
Id:0900d8058008c5b8
Recovery in the Call-processing Periphery
Reloading of the transient data into the initialized CCNP (from the active CCNP).
Start of the reloaded CCNP.
Reconfiguration to operating state standby.
Routine tests run in the CCNP during standby. When data changes, the standby CCNP
is updated by the active CCNP, so that the standby CCNP is in a position to take over
the SS7 signaling tasks at any time.
Effects of recovery
MSUs (message signal units) that are located in the faulty CCNP and waiting to be pro-
cessed there are lost.
Recovery escalation
If an error occurs while a recovery is being performed in a CCNP, the CP safeguarding
programs initiate the following actions:
Reconfiguration of the faulty CCNP to unavailable.
Attempts at loading are carried out in periodic intervals. If these prove successful the
unit is configured back to the standby state.
5.3.3 Recovery Level 2, Initial Start for the Entire CCNC
The initial start for the entire CCNC is the highest recovery level in the CCNC; both
CCNPs and all SILT groups are included in the recovery.
CP safeguarding programs initiate the initial start from the CP and monitor its course.
Reason
The initial start is initiated by:
System recovery which runs in the CP in one of the ISTART2 recovery levels
Recovery actions
The CP safeguarding programs initiate the following actions:
Resetting and initialization of the two CCNPs.
Reloading of the semipermanent data and of program code into the two CCNPs
(from the CP).
Reconfiguration of the two CCNPs; firstly, one of them to active, then the other one
to standby.
Reloading of the semipermanent data to all SILT groups (from the active CP).
Generation of the transient network data in both CCNPs.
Activation of the signaling links.
Restoration of the SS7 signaling traffic in the active CCNP.
Updating of the standby CCNP by the active CCNP.
Effects of recovery
None. Calls have already been cleared down by the CP.
Recovery escalation
If an error occurs while a recovery is being performed in a CCNP, the CP safeguarding
programs initiate the following actions:
Reconfiguration of the faulty CCNP to unavailable.
Reloading and activation at periodic intervals.




59
Recovery Recovery in the Call-processing Periphery
Id:0900d8058008c5b8
5.3.4 Escalation Strategy
The CCNC programs can choose between two optionsdepending on the severity and
extent of the software errorfor the purpose of error treatment:
Clearing software error themselves.
The error-detecting CCNC program starts a error-treatment routine. The error-
detecting CCNC program passes on the task of saving the error symptoms (entering
them into the CCNC error notebook) to the CCNC safeguarding programs by means
of the restart call; see section CCNC restart below.
Having software error cleared by means of recovery.
The error-detecting CCNC program sends a recovery call to the CCNC safeguarding
programs, which in turn start a CCNC recovery level for the purpose of clearing the
software error.
Run check of the recovery levelsescalation
Each CCNC recovery level checks for itself whether the recovery actions performed are
successful; if the result is negative, the recovery level initiates an escalation.
Escalation with CCNC recovery means, depending on the particular recovery level:
Start of the next-highest CCNC recovery level.
Initiation of a specific action.
Figure Diagram of CCNC recovery escalation outlines the recovery levels and actions
of an escalation.
A higher-order check is carried out from the CP: within the framework of CCNC super-
vision, the CP safeguarding programs check whether a CCNC recovery level runs cor-
rectly.
Figure 2 Diagram of CCNC recovery escalation
Restart
for an SILTC
escalates to
Restart
for an SILTD
escalates to
Restart
for a CCNP
escalates to
Initial start
for an SILT group;
with escalation:
Initial start
for an SILTD;
with escalation:
Configure SILTC to
unavailable and
repeat loading process
Configure SILTD to
unavailable and
repeat loading process
Configure faulty CCNP
to unavailable and
repeat loading process
Initial start
for a CCNP;
with escalation:


60
Recovery
Id:0900d8058008c5b8
Recovery in the Call-processing Periphery
CCNC restart
The CCNC restart is not a case of recovery and is explained here purely in the context
of the CCNC recovery escalation strategy.
The CCNC restart is called if CCNC programs detect a software error which they are
able to clear themselves. The software error can, for example, be an inconsistency in
transient data.
A restart call has no effect on either CCNC switching operation or data.
After the restart call, the error-detecting CCNC program continues its run. No further
actions are required.
Escalation of a CCNC restart
CCNC restart can arise in all three CCNC units. For this reason, each CCNC unit has
its own separate restarts:
Restart for an SILTD.
Restart for an SILTC.
Restart for a CCNP.
Escalation of each of these restarts is defined by means of error threshold limits. Each
CCNC unit has its own error counting system with different error counters, to distinguish
between the type of the error that has occurred and the fault location (within the unit). If
an error counter exceeds the error threshold limit assigned to it, the CCNC restart ini-
tiatesin each case depending on the type of error counterone of the following
actions:
Output of an error message on the management system and entry of the error
message in a history file.
Escalation to one of the CCNC initial starts in recovery level 2.
Escalation of the restarts is shown in figure Diagram of CCNC recovery escalation
above; the diagram does not show the entry of error messages in the history file.
5.4 RSU Recovery
The elements of a remote switching unit (RSU) affected by recovery are the RSU control
(RSUC), the message handler (MH), the digital interface units (DIU) and the timeslot
interchange modules (TSIM), which are installed locally in the host timeslot inter-
changes (HTI) and remotely in the remote timeslot interchanges (RTI). Recovery of the
RSU control is controlled by the coordination processor (CP), whereas recovery of the
MH, DIU and timeslot interchange module is controlled by the RSU control.
There is no specific RSU recovery initiated by means of MML commands. Recoveries
in the RSU are triggered by internal processor events, by the configuration of RSU units
or by a system recovery in the CP. The latter case involves a recovery of the HTI and
RTI, which takes place after activation of the switching network (SN) and before activa-
tion of the local line/trunk groups (LTG).
There are two recovery levels for RSU units:
Recovery Level 2.1
Recovery Level 2.2




61
Recovery Recovery in the Call-processing Periphery
Id:0900d8058008c5b8
5.4.1 Recovery Level 2.1
Reason
This recovery level is restricted to one specific RSU unit and is initiated for the following
reasons:
Automatic request due to software errors.
System recoveries (ISTART1 or ISTART2, requested via a command from the RSU
control).
Configuration (requested via a command from the RSU control).
Corruption of program code in the RAM.
Detection of endless loops (if the operating system is working with pre-emptive mul-
titasking, otherwise an endless loop causes a watchdog event leading to recovery
level 2.2).
Recovery actions
The following actions are taken during a recovery level 2.1:
The processor is reset (without resetting the application specific integrated circuit
(ASIC)).
The content of the FEPROM is copied to the RAM.
The operating system is restarted.
All data are initialized.
All processes are restarted.
The semipermanent data are reloaded (only in ISTART2).
Effects of recovery
Due to the temporary unavailability of RSU units, this recovery level leads to the loss of
calls in the process of being set up or cleared down (transient calls).
Table Downtimes for transient calls (recovery level 2.1 in the RSU units) lists the down-
times for transient calls caused by recovery level 2.1 in various units of the RSU.
The loading of data only takes place under certain conditions. Established calls are unaf-
fected.
Recovery escalation
If a recovery is unsuccessful, or if a new recovery request is received from software error
treatment (SWET), recovery level 2.1 can escalate to recovery level 2.2.
The DIU does not accept any new commands during the time that it is executing a recov-
ery. It safeguards established calls during a recovery level 2.1. The PCM maintenance
RSU unit Downtime
RSU control HTI (RSUC-HTI) < 10 s + 2 s to load data
RSU control RTI (RSUC-RTI) < 35 s + 2 s to load data
MH < 5 s
Timeslot interchange module (TSIM) 15 s
DIU 5 s
Table 12 Downtimes for transient calls (recovery level 2.1 in the RSU units)


62
Recovery
Id:0900d8058008c5b8
Recovery in the Call-processing Periphery
data must be reloaded to the DIU by the RSU control after every DIU recovery. Internal
DIU faults always lead to a DIU recovery at level 2.2.
5.4.2 Recovery Level 2.2
Reason
The faulty unit requests recovery level 2.2 due to:
Escalation.
A watchdog event.
System recoveries (ISTART2R, ISTART2F or ISTART2G, requested via a
command from the RSU control).
Configuration (requested via a command from the RSU control).
Recovery actions
A recovery at level 2.2 in an RSU control resets the message handlers at recovery level
2.2.
After recovery in the MHs, the RSU control interrogates the MH software. If the software
in the MH FEPROM has not been corrupted, the MHs are reactivated; otherwise they
are taken out of service.
The following actions are carried out during a recovery at level 2.2:
The processor and the ASIC are reset.
The content of the FEPROM is copied to the RAM.
The data areas are formatted.
Program code is loaded, as an option or mandatory (only those units that are depen-
dent on the CP generation, such as RSU control and MH).
The operating system is restarted.
All data are initialized.
All processes are restarted.
The semipermanent data are loaded.
Effects of recovery
Due to the temporary unavailability of RSU units, this recovery level leads to the loss of
calls in the process of being set up or cleared down (transient calls).
Table Downtimes for transient calls (recovery level 2.2 in the RSU units) lists the down-
times for transient calls caused by recovery level 2.2 in various units of the RSU.
All of these times vary according to the actual quantity of program code (the time needed
to copy the code from the FEPROM to the RAM). The RSU control in the RTI (RSUC-
RTI) requires, in the worst case, 25 seconds to restore the high-speed signaling links
RSU unit Downtime
RSU control HTI (RSUC-HTI) < 25 s + 3-27 min to load program code
RSU control RTI (RSUC-RTI) < 40 s + 5 min to load program code
MH < 10 s + 2-15 min to load program code
TSIM 25 s
DIU 10 s
Table 13 Downtimes for transient calls (recovery level 2.2 in the RSU units)




63
Recovery Recovery in the Call-processing Periphery
Id:0900d8058008c5b8
(HSL) in addition to the RSUC-HTI time. Loading of program code is conditional during
an ISTART2R or ISTART2G or a configuration, and unconditional during an ISTART2F.
The loading time depends on the number of message channels (MCH) used.
Furthermore, recovery level 2.2 leads to the loss of established calls and to interruptions
at all interfaces.
The downtimes for established calls are listed in table Downtimes for established calls
(recovery level 2.2 in the RSU units).
There is no level-2.2 recovery for individual timeslot interchange modules (TSIM). The
effects of recovery for ISTART2R, ISTART2F and ISTART2G and for configuration are
not mentioned, because in these cases recovery is preceded by the system, side or
RSU unit being out of service.
5.5 MCP Recovery
Recoveries in the media control platform (MCP) are triggered by its internal subsystems
or by the coordination processor (CP).
The following MCP recovery levels are possible:
MCP Initialization
MCP Soft Reset
MCP Full Reset
MCP initialization and MCP soft reset have influence on media control tasks (MCT) only.
The MCP full reset affects the whole MCP with all MCTs.
A recovery cannot be invoked by an MML command at the CP, but only implicitly during
online operations, for instance in the case of an MCP deactivation or as a result of a
recovery escalation.
5.5.1 MCP Initialization
The CP requests a recovery with the level initialization.
Reason
The following reasons can lead to this request:
System recovery in recovery levels ISTART1 and ISTART2.
Configuration of the MCP from the maintenance state to the active state (initiated
by the CP due to the postprocessing of an ISTART21, ISTART2 or ISTART2R).
Recovery actions
The message queues for all media control tasks (MCT) of the MCP are reset. The
MCPUs and subsystems stay active.
RSU unit Downtime
RSU control (RSUC) no effect
MH 5 s failure of HSL, MCH
Timeslot interchange module (TSIM) 25 s failure of all links on this side
DIU 10 s failure of PCM
Table 14 Downtimes for established calls (recovery level 2.2 in the RSU units)


64
Recovery
Id:0900d8058008c5b8
Recovery in the Call-processing Periphery
Effects of recovery
During the MCP initialization:
The calls in the process of being set up or cleared down are aborted and lost.
The existing calls are cleared down.
Recovery escalation
A failed MCP initialization can escalate to an MCP full reset; see Escalation Strategy.
5.5.2 MCP Soft Reset
The CP requests a recovery with the level soft reset.
Reason
The MCP soft reset can be initiated by the following:
System recovery in recovery level ISTART2R/ISTART2G.
Configuration of the MCP from the maintenance state to the active state (condi-
tional initiated by MML command).
Configuration of the MCP from the unavailable or not accessible state to the
active state (initiated by the CP due to a successful re-connection to the MCP or
to routine tests).
MCT 2.1 bulk recovery.
Recovery actions
MCP soft reset initiates:
Deletion of all media control tasks (MCT) of the MCP.
Resetting of all message queues for the MCTs.
Effects of recovery
During the MCP soft reset:
The calls in the process of being set up or cleared down are aborted and lost.
The existing calls are cleared down.
If the recovery was initiated by the CP, the MCP and all its MCTs may be configured to
active by the CP.
Recovery escalation
A failed MCP soft reset can escalate to an MCP full reset; see Escalation Strategy.
5.5.3 MCP Full Reset
Reason
The MCP full reset can be initiated by the following:
System recovery in recovery level ISTART2F.
Configuration of the MCP from the maintenance state to the active state (initiated
by MML command (unconditional) or by the CP due to the postprocessing of an
ISTART2F).
A major software error detected by an MCP subsystem (for example, the software
watchdog task detects an endless loop of an MCP subsystem).
Persistent isolation of the MCP, that is, the (logical) SCTP connection between
message buffer and MCP is broken.




65
Recovery Recovery in the Call-processing Periphery
Id:0900d8058008c5b8
Escalation of MCP initialization (due to postprocessing after an ISTART1, ISTART2
or ISTART2R).
Escalation of MCP soft reset (MCT 2.1 bulk recovery).
Recovery actions
MCP full reset initiates:
Resetting of both MCPU.
Starting up of both MCPU and all MCP subsystems.
Configuring of the MCP and all its media control tasks (MCT) to active by the CP.
Loading of the MCP software, if its not up to date, from the packet control unit
serving as DHCP server.
Effects of recovery
During the MCP full reset:
The calls in the process of being set up or cleared down are aborted and lost.
The existing calls are cleared down.
Recovery escalation
No further escalation level.
If recovery level MCP full reset is not completed successfully, coordination processor
safeguarding programs configure the MCP to unavailable.
5.5.4 Error Handling in Case of Isolation
An isolation condition of the MCP exists if the (logical) SCTP connection between
message buffer and MCP is broken. If the isolation state persists after a certain time
(180 minutes) the MCP maintenance task enforces an MCP full reset.
Notes:
This strategy is not performed during upgrade.
The timer values can only be adapted via patch.
The strategy only takes effect if at least one SCTP connection was up at some point
and then failed.
The strategy is implemented in order to perform a kind of self-healing process on the
MCP. This process may make it unnecessary, that the craft has to travel to the
location where the MCP in isolation is situated in order to replace the MCP with a
different fault-free one. There is no possibility to repair the MCP using MML
commands at the CP.
5.5.5 Escalation Strategy
In the context of the postprocessing after an ISTART1, ISTART2 or ISTART2R an MCP
initialization is executed. If this is not completed successfully an escalation to MCP full
reset takes place.
Irrespective of the reason a failed MCP full reset leads to configuring the MCP to the
unavailable state.
An MCT 2.1 bulk recovery leads to an MCP soft reset. A second MCT 2.1 bulk recovery
within a time period of one hour escalates to an MCP full reset.


66
Recovery
Id:0900d8058008c5b8
Recovery in the Call-processing Periphery
Escalationescalation parameters
On the basis of this supervision period and the error threshold limit, the MCP safeguard-
ing programs decide, when further errors occur, whether an escalation to a higher
recovery level should be initiated or not.
Escalation takes place if the following conditions are fulfilled:
Further errors occur while the recovery level is running or during the subsequent
supervision period.
The error threshold limit is reached.
Table MTP recovery levels and their escalation parameters contains the escalation
parameters for each recovery level. These are: the duration of the supervision period,
the error threshold limit, and the next-highest recovery level.
5.6 MCT Recovery
A media control task (MCT) acts as a virtual LTG, i.e from the CP point of view there is
no difference between an LTG and an MCT. Thus the recovery behavior of the MCTs
and LTGs is almost identical and the remarks to LTG recoveries apply accordingly to
MCTs (see section LTG Recovery). The following topics point out the differences.
5.6.1 Recovery Level 1
see section Recovery Level 1 (LTG New Start)
5.6.2 Recovery Level 2.1
see section Recovery Level 2.1 (LTG Initial Start 1)
5.6.3 Recovery Level 2.2
The LTG recovery level 2.2 comprises the reloading of program code. Since one copy
of the media control task (MCT) code is shared by all the MCTs running on an MCP the
MCP recovery level Supervisi
on
time
Error
threshold
limit
Escalation to
MCP initialization - 1 MCP full reset
1)
MCP soft reset - 1 MCP full reset
1)
60 min 2 MCP full reset
2)
MCP full reset - 1 no escalation
MCP is configured to unavailable
Table 15 MTP recovery levels and their escalation parameters
Notes relating to table MTP recovery levels and their escalation parameters:
1)
in the context of the postprocessing after an ISTART
2)
if the reason for the MCP soft reset was an MCT 2.1 bulk recovery




67
Recovery Recovery in the Call-processing Periphery
Id:0900d8058008c5b8
recovery level 2.2 can not be used for MCTs. Instead of this in case of MCTs all
reasons which would cause an recovery level 2.2 for an LTG lead to an MCP soft reset
(see section Escalation Strategy).
5.6.4 MCT 2.1 Bulk Recoveries
If the number of media control tasks (MCT) of an MCP executing an MCT recovery level
2.1 at the same time exceeds a predefined limit then the CP initiates an MCP soft reset.
This is done because this behavior can be caused by corrupt MCT or MCP software.
5.6.5 Escalation Strategy
Since there is no recovery level 2.2 for media control tasks (MCT), the escalation
strategy is a little different.
If a problem of one MCT can not be cleared by a number of recoveries of level 2.1 during
a certain time the relevant MCT is configured to unavailable by the CP (see table MCT
recovery levels and their escalation parameters).
MCT restart or
MCT recovery level
Supervision
time
Error threshold
limit
Escalation to
MCT restart 5 min 10 MCT recovery level 1
MCT recovery level 1 5 min 5 MCT recovery level 2.1
MCT recovery level 2.1
(one MCT of an MCP)
120 min 9 no escalation
MCT is configured to
unavailable
MCT recovery level 2.1
(several MCTs of an
MCP at the same time
MCT 2.1 bulk
recoveries)
75% of all active
MCTs,
minimum 5,
maximum 30
MCP soft reset (see
section MCP Soft Reset)
Table 16 MCT recovery levels and their escalation parameters


68
Recovery
Id:0900d8058008c5ba
MP Recovery
6 MP Recovery
The main processors (MP) incorporated in the SSNC have their own recovery concept
(only in the network node configuration with an SSNC).
6.1 Recovery Concept
Basic principles
There can be no guarantee that software is completely free of bugs or errors. For this
reason, the SURPASS hiE 9200 software contains mechanisms for reporting software
problems and for preventing these problems from having a negative effect on the rest of
the system.
The Software Recovery complex is responsible for dealing with software problems
detected and reported by the application software, and for restoring service. The
recovery software is divided into two parts: Software Error Treatment (SWET) and
Startup.
Figure 3 Handling of software problems by Software Recovery
The regular application software is not capable of launching a recovery. The application
software must contain internal error-neutralization procedures for every known potential
software problem within its programs.
The only action that can be taken by the application software is to report software
problems to SWET (arrow no.1 in figure Handling of software problems by Software
Recovery). SWET analyzes and evaluates these software problems and saves
symptoms relating to the software problems. Normally, SWET returns control to the
application program, which then completes its own fault treatment routines and contin-
ues its normal operations (arrow no.2). Only in the event of serious problems does
SWET request a recovery (arrow no.3).
When a recovery is requested by SWET or by a system operator, it is the Startup part
of the complex that executes and supervises the recovery (arrow no.4). Depending on
the requested recovery level, Startup starts a user process or a recovery suite or
performs recovery for the entire platform (MP). Each recovery level has different effects
on the user processes.
Software Recovery
Software Error Treatment
(SWET)
Startup

Application software
3
1 2 4




69
Recovery MP Recovery
Id:0900d8058008c5ba
The application software receives information from Startup on the last recovery to be
run, which it uses to start its tasks.
The treatment of software errors is divided into the following categories:
a) Errors that can be neutralized by the application software.
These errors are detected by normal check functions in the application software or
by audit programs, and do not require any recovery action.
If an application program detects an error, it has to correct it itself. In some
cases, the error can be eliminated by re-establishing consistency between data
and the real-world conditions they represent, or by terminating a task. The appli-
cation is also capable of saving error symptoms.
Errors detected by audits are corrected by restoring data to a consistent and
valid status.
b) Errors that require a simple restart of the application software.
This category refers to errors in the application software that have been detected by
the operating system or by the hardware (for example, division by zero, incorrect
pointers), and to errors caused by repeated attempts to neutralize errors in the appli-
cation software.
If, at any time, the application software is restarted, it checks and corrects the data
relating to its last activity, to find out where the error lies. Reasons for restarting the
software include a break in communication with other processes, the loss of
messages in a sequence, and message sequence errors.
After a new start, the interrupted communication is either stopped, or restarted, or
continued after synchronization with the other software. The protocols used by the
application software contain error-detection mechanisms for this purpose (for
example, time-outs, sequence numbers).
c) Errors that require a synchronized startup of the platform.
This category includes errors that occur more frequently than the acceptable rate,
and specific types of errors (for example, in the operating system) that have to be
neutralized by restarting the entire platform.
It may be necessary to initialize or reload data areas in order to bring the system into
a defined status.
Recovery levels
In order to minimize as far as possible the impact on the system of actions to neutralize
software errors, there are a number of different recovery levels, each with different
effects on the system.
Two basic types of recovery level are employed in the MPs:
Low-level recoveries.
A low-level recovery involves the initialization of one process or of a list of pro-
cesses.
All of the relevant processes are started simultaneously.
High-level recoveries.
The difference between high-level and low-level recoveries is that high-level recov-
eries also involve the re-initialization of the operating system.
The startup is table-driven. The processes are started at their predefined synchroni-
zation points in accordance with the startup tables.


70
Recovery
Id:0900d8058008c5ba
MP Recovery
Synchronization
A table-driven startup system is implemented in the MPs. It is based on a framework of
synchronization points, defined by the startup team and the software developers and
implemented in the program code with the aid of special tools.
The synchronization points at system level are largely project-independent.
The synchronization points (SP) for the MPs are listed in table Synchronization points
(SP) in high-level recoveries. All synchronization points are reached simultaneously by
all MPs.
6.2 Recovery Levels
6.2.1 Installation
There is no specific recovery level for installation in the MPs. When a system is activated
for the first time, a recovery is started including loading and formatting in the MP:SA. In
most cases, this recovery is run on the basis of the APS stored on the system disk,
without the database. Because the database is empty, only the MP:SA is activated, and
not the peripheral units. The periphery is set up and configured by the system operators
after the MP:SA has been activated.
6.2.2 Low-level Recoveries
Low-level recoveries are usually executed without the use of the synchronization mech-
anism. Depending on the level, they may involve a single process entity or all of the pro-
cesses on the platform. Low-level recoveries never include other dependent platforms.
SP Meaning
SP0 The operating system is available and user mode has been started.
SP1 Basic maintenance functions (file system in the ATM node) are available.
Downloading of the database can be started.
SP2 The database is available.
SP3 Maintenance and standard error detection and correction functions are
available. Applications can be started.
SP4 All applications are up and running.
SP4.1 The service ring is available.
SP4.5 The PCM links to the rest of the network are open.
SP5 The rest of the signaling network is available.
SP6 The network node is capable of handling (signaling) relations with the rest of
the network.
SP7 Recovery has been completed and postprocessing has started.
SP8 Postprocessing has been completed and startup is complete. Routing and
audit functions have been started.
Table 17 Synchronization points (SP) in high-level recoveries




71
Recovery MP Recovery
Id:0900d8058008c5ba
Recovery of a single process (PROREC2)
This recovery only affects one process, or a list of processes (recovery suite, see the
description of the software structure in the description Software Principles). All of the
resources being used by the process are released. The processes are restarted without
synchronization.
Processes that are started by the operating system are restarted automatically. Pro-
cesses that are started dynamically by other processes (for example, child processes)
are merely halted, and the parent processif it is still runningis informed. Halted child
processes must be restarted by the parent process.
PROREC2 is executed, for example, in response to one of the following events:
Repeated error in the application software.
Serious error in the application software caused by messages.
Endless loop or errored stack conditions detected by a sanity check.
Lack of resources.
Recovery of all processes on a platform (PROREC3)
In this recovery, all operating-system resources occupied by processes (with the excep-
tion of transactions) are released by the operating system. All processes that were
started by the operating system are restarted. The consequences for user processes
are the same as in a PROREC2.
Unlike in PROREC2, the parent processes are not informed by the operating system.
The parent processes have to derive the information that their child processes have
been halted from the recovery level.
PROREC3 is executed, for example, in response to one of the following events:
Execution of an escalation step as the result of a defined number of unsuccessful
PROREC2 recoveries.
A request for PROREC2 during the execution of another PROREC2.
A request for PROREC2 for the startup supervision user process in the MP:SA.
The set of processes affected by PROREC2 exceeds a defined number.
Escalation of the recovery level due to excessively frequent software errors after a
PROREC2.
Explicit request by a system operator.
6.2.3 High-level Recoveries
All high-level recoveries involve the use of the synchronization mechanism, that is, pro-
cesses that are started by the operating system are started automatically and the syn-
chronization between them is supported.
Recovery of the entire platform (FULLREC/FULLREC-BOP)
High-priority errors and serious errors in control data that extend beyond the boundaries
of a single process lead to a FULLREC. In FULLREC, all operating-system resources
are released (by the operating system) and high-impact data that require initialization
(process data that are only initialized in connection with high-level recoveries) are auto-
matically initialized by the operating system. Low-impact data (process data that are ini-
tialized during every recovery) have to be initialized by the user processes.
FULLREC is executed, for example, in response to one of the following events:
Execution of an escalation step as the result of a defined number of unsuccessful
PROREC3 recoveries.


72
Recovery
Id:0900d8058008c5ba
MP Recovery
Endless loop in supervisor mode.
Explicit request by a system operator.
A FULLREC-BOP (FULLREC basic operation) involves the execution of all FULLREC
actions and, in addition, the reduction of the quantity of software to be started.
FULLREC-BOP is executed, for example, in response to one of the following events:
A request for LOADREC2-local by Software Error Treatment when the currently
valid generation cannot be loaded (only in the dependent MPs, that is, all MPs other
than MP:SA).
A request for PROREC2/3 or FULLREC at a time when basic operation is in effect
due to a FULLREC-BOP.
An unsuccessful attempt to restore normal operation following a request by a system
operator at a time when basic operation is in effect due to a FULLREC-BOP.
Execution of an escalation step as the result of an unsuccessful FULLREC in the
MP:SA.
Recovery with loading (LOADREC2-local/LOADREC2-BOP)
LOADREC2-local recovery is designed to eliminate serious errors in the database. It is
available in all dependent MPs (all MPs other than MP:SA). This recovery acts on a
single platform only. All other platforms (MP, PCP, CP) remain in normal operating
mode.
A LOADREC2-local involves the execution of all FULLREC actions plus the following:
The memory of the affected MP platform is formatted, with the exception of the save
area and the communication area. The save area contains nothing but error symp-
toms.
Program code and semipermanent data are loaded from the hard disk.
Transient copies of the semipermanent data are loaded from other MP platforms.
Transient copies of the semipermanent data are updated on other MP platforms
(MP, PCP, CP).
Communication with the MP in which the recovery is run is interrupted at the beginning
of the LOADREC2 and resumed at the end of the recovery.
LOADREC2-local is executed, for example, in response to one of the following events:
Execution of an escalation step as the result of a defined number of unsuccessful
FULLREC recoveries.
Explicit request by a system operator.
Hardware fault.
A LOADREC2-BOP (LOADREC2 basic operation) involves the execution of all
LOADREC2-local actions and, in addition, the reduction of the quantity of software to be
started.
LOADREC2-BOP is executed, for example, in response to one of the following events:
Execution of an escalation step as the result of a defined number of unsuccessful
LOADREC2-local recoveries in an MP that is vital to the system.
A request for PROREC2/3, FULLREC or LOADREC2-local at a time when basic
operation is in effect due to a LOADREC2-BOP.
An unsuccessful attempt to restore normal operation following a Q3 request for
LOADREC2-local at a time when basic operation is in effect due to a LOADREC2-
BOP.




73
Recovery MP Recovery
Id:0900d8058008c5ba
Recovery with loading (LOADREC2-system/LOADREC3)
A LOADREC2-system involves the execution of all LOADREC2-local actions on all
platforms in the system.
LOADREC2-system is executed, for example, in response to one of the following
events:
Execution of an escalation step as the result of a defined number of unsuccessful
LOADREC2-BOP recoveries.
Explicit request by a system operator.
A LOADREC3 involves the execution of all LOADREC2-system actions plus the follow-
ing:
Formatting of the entire memory.
Execution of fallback to an older generation.
A LOADREC3 is executed, for example, in response to the execution of an escalation
step as the result of a defined number of unsuccessful LOADREC2-system recoveries
(error in the generation).
6.3 Parallel-running Recoveries
Tables Parallel-running recoveries (initiated due to software error) and Parallel-running
recoveries (initiated by a system operator) show the times at which a new recovery can
be processed in parallel to an already running recovery.
Running
recovery
New (interrupting) recovery
lower level same level higher level
PROREC2 - after completion at any time
PROREC3 RPM received from affected
processes
at any time at any time
FULLREC PROREC2: after SP4
Others: after SP7
after initialization
of CSC
at any time
LOADREC2-
local
PROREC2: after SP4
Others: after SP7 +
escalation time
after SP7 +
escalation time
at any time
LOADREC3 PROREC2: after SP4
Others: after SP7 +
escalation time
after SP7 +
escalation time
at any time
Table 18 Parallel-running recoveries (initiated due to software error)
Running
recovery
New (interrupting) recovery
lower level same level higher level
PROREC2 - - at any time
PROREC3 - after completion at any time
Table 19 Parallel-running recoveries (initiated by a system operator)


74
Recovery
Id:0900d8058008c5ba
MP Recovery
Recovery of a single process in parallel to a synchronized recovery
Recoveries for single processes can be executed in parallel to a synchronized recovery
as soon as all processes have sent their Recovery Progress Message and synchroniza-
tion point SP4 has been reached.
Platform recovery in parallel to a synchronized recovery
A synchronized recovery can be interrupted by low-level recoveries as soon as synchro-
nization point SP7 has been reached. By synchronization point SP7, the recovery has
been completed and postprocessing commences. All applications, with the exception of
those involved in postprocessing, should have performed their required functions up to
this synchronization point. All functions required between synchronization points SP7
and SP8 must be performed by the postprocessing applications, even if the synchro-
nized recovery is interrupted by a non-synchronized low-level recovery.
6.4 Escalation
Internal escalation
The recoveries involving basic operationFULLREC-BOP and LOADREC2-BOPare
split into recovery with extended basic operation (BOP extended) and recovery with
kernel basic operation (BOP kernel). Symptom saving is not performed in connection
with BOP kernel. In the cases described (see sections Recovery of the entire platform
(FULLREC/FULLREC-BOP) and Recovery with loading (LOADREC2-
local/LOADREC2-BOP)), the first escalation is to BOP extended and the next to BOP
kernel. This principle allows service restriction to be graduated (see also table MP
recovery levels and their escalation parameters).
FULLREC after SP7 after SP7 MP:SA: after SP7
Other MPs: at any
time
LOADREC2-
local
after SP7 + escalation time MP:SA: after SP7
+ escalation time
Other MPs: after
SP8
MP:SA: after SP7
Other MPs: at any
time
LOADREC3 after SP7 + escalation time after SP7 +
escalation time
after SP7
Notes relating to tables Parallel-running recoveries (initiated due to software error) and
Parallel-running recoveries (initiated by a system operator):
RPM Recovery Progress Message
CSC Critical State Control
SP Synchronization Point (see Synchronization)
Running
recovery
New (interrupting) recovery
lower level same level higher level
Table 19 Parallel-running recoveries (initiated by a system operator) (Cont.)




75
Recovery MP Recovery
Id:0900d8058008c5ba
Escalation parameters
Regardless of whether the MP program that detected the error issued a restart
command or a recovery command, in both cases a watchdog timer is started: The MP
safeguarding programs monitor the arrival of further restart or recovery commands
during the supervision time.
The length of the supervision time depends on the recovery level that has been started.
In addition to the supervision time, an error threshold is defined in each case; see table
MP recovery levels and their escalation parameters.
The MP safeguarding programs use the supervision time and the error threshold to
decide whether or not to initiate escalation to a higher recovery level in the event of
further problems.
An escalation is initiated if the following conditions are fulfilled:
Further problems are detected during execution of the recovery level or during the
subsequent supervision time.
The error threshold is reached.
Table MP recovery levels and their escalation parameters lists the escalation parame-
ters associated with each recovery level: length of supervision time, error threshold and
the next higher recovery level.
Recovery level Supervision
time
Error
threshold
Escalation to
PROREC2 50 s 10 PROREC3
PROREC3 50 s 5 FULLREC
FULLREC 480 s 3 FULLREC-BOP-extended
1)
or
LOADREC2-local
2)
FULLREC-
BOP-extended
5 min 1 FULLREC-BOP-kernel or
LOADREC2-system
3)
FULLREC-
BOP-kernel
5 min 1 LOADREC2-system
3)
LOADREC2-
local
20 min 3 LOADREC2-BOP-extended
4)
or
configuration to not accessible
5)
LOADREC2-
BOP-extended
40 min 1 LOADREC2-BOP-kernel
LOADREC2-
BOP-kernel
40 min 1 LOADREC2-system
LOADREC2-
system
at least15 min
after SP0 and
at least
SP7 reached
2 LOADREC3
Table 20 MP recovery levels and their escalation parameters
Notes relating to table MP recovery levels and their escalation parameters:
1)
in the MP:SA


76
Recovery
Id:0900d8058008c5ba
MP Recovery
6.5 Symptom Saving
Recovery makes use of two forms of symptom saving:
Standard Symptom Saving
Data CollectionAdditional Symptom Collection and Immediate Symptom Saving
6.5.1 Standard Symptom Saving
Standard symptom saving is used in normal operating mode and especially in all
recovery levels in the MPs. It writes the relevant symptoms to a save area prior to the
recovery actions, that is, a recovery-protected data area in the main memory of the MP
in question.
The following information is output or saved in standard symptom saving:
An advisory message indicating that an error has occurred (data collected from the
software incident report and software error report commands (SWINC, SWERR),
module name, subsystem).
An advisory message concerning load actions to the MP, PCP (this message is
omitted if loading is not necessary).
An advisory message indicating that recovery has been completed and that call-pro-
cessing continues without problems.
Symptoms of the problem.
As a rule, the advisory messages are output at the management system and saved to a
history file; see History files in the system.
Standard symptom saving for recoveries in MP and PCP
Before a recovery is launched, Software Error Treatment collects error symptoms and
compiles a symptom package which is written to a save area of the main memory.
When the recovery has been completed, Software Error Treatment transfers the
symptom package from the save area to the history file SG.SESYMPE; see History files
in the system.
History files in the system
History files are created in duplicate on both system disks for the saving of a variety of
information. The number of history files and their type is variable; the following list is
merely one example:
2)
in the dependent MPs (all MPs other than MP:SA)
3)
without exception if a total outage is detected
4)
in the dependent MPs (all MPs other than MP:SA) vital to the system
5)
in the dependent MPs (all MPs other than MP:SA) not vital to the system
SP synchronization point (see Synchronization)
SG.SESYMPE is used to store symptoms reported by Software Error
Treatment (SWET) relating to software problems that have
occurred (see Basic principles)
SG.OPERE is used to store safeguarding messages
SG.SUPE is used to store startup symptoms




77
Recovery MP Recovery
Id:0900d8058008c5ba
6.5.2 Data CollectionAdditional Symptom Collection and
Immediate Symptom Saving
Reasons for data collection
Additional symptom collection, also referred to as data collection, takes place in addition
to standard symptom saving in the event that a system recovery is interrupted as the
result of a software problem and escalates to a recovery with memory formatting.
The interruption of a system recovery in execution is an exceptional case. In the excep-
tional case interruption of system recovery, data collection is started before any
recovery actions, and thus prior to formatting of the memory.
Position of data collection in the execution sequence
In general terms, the sequence of actions can be described as follows:
An exception case has occurred, causing an executing recovery to be interrupted.
Data collection is activated in response to the interruption of recovery, to collect
additional, supplementary symptoms relating to the interruption in addition to the
standard symptoms, and saves the symptoms immediately to both system disks
(redundant save).
When data collection has been completed, the recovery program initiates the next-
higher recovery level (see also section Escalation). If a further software problem
should occur during execution of the next-higher recovery level, that level is inter-
rupted once again: Data collection gathers another set of data-collection symptoms,
after which the next-higher recovery level is launched.
When this new recovery level has been executed without errorrecovery is
complete and the system is operating in normal modea system operator enters a
Q3 task to initiate the editing of the data-collection symptoms stored on the system
disk. They can then be copied to the management system.
The edited data-collection symptoms are used by the system vendor to analyze and
resolve the source of the problem.
Execution of data collection
The procedures involved in a data collection action can be classified as follows:
Collection of data-collection symptoms relating to the interruption of recovery
Immediate saving of data-collection symptoms to the system disks
Editing of collected symptoms to a readable format and their output to system disk
These three procedures are described briefly in the following sections.
Collection of data-collection symptoms relating to the interruption of recovery
Data Collection collects the following symptoms:
Symptoms package relating to the software problem encountered.
LOGbook recorded by the Safeguarding Monitor.
Control data and status data relating to recovery in the MPs.
Control data and status data relating to recovery in the PCPs.
Trace points of the interrupted recovery levels (trace-point history).
History of recovery escalations prior to interruption of recovery.
SG.SWFPE is used to store the symptoms collected by system-wide (cross-
platform) Software Error Treatment
SG.AUDITE is used to store symptoms collected by audits


78
Recovery
Id:0900d8058008c5ba
MP Recovery
Vital operating-system data.
Immediate saving of data-collection symptoms to the system disks
Data Collection outputs the symptoms it has collected immediately:
To system files DCEV00 for the MP:SA.
To system files DCEV10 for the dependent MPs.
Data Collection automatically initiates the creation of DCEV files. In cases where the first
recovery interruption is followed by others, and Data Collection therefore saves further
sets of symptoms, the DCEV files are named DCEV02 to DCEV07 for the MP:SA and
DCEV11 to DCEV17 for the dependent MPs.
Editing of collected symptoms to a readable format and their output to system
disk
The saved symptoms are recorded unedited as character strings in a non-readable
format in the DCE files. The unedited format allows the symptoms data to be transferred
rapidly from the main memory to the system disks. This high-speed transfer is particu-
larly important in the critical situation of recovery interruption.
Once the system has been restored to normal operating mode, the symptoms gathered
by Data Collection can be edited to a readable format for analysis and transferred to the
management system.

You might also like