Tag: Systems thinking

Incident Management Demystified


Due to the easy recognition that a service offers a modular approach to assembling supporting facilities for operations at any scale, the concept of “services” is a hot topic at many levels of organizational production. In turn, that makes the experience o using services a top priority in the consideration of operational and production performance.

This leads to a quick appreciation of why difficulties in service utilization become so prominent in an organization: services in effect become the “API” that is provided to the workforce for exploiting industrial-grade support of their efforts — in particular, the support provided by any enabling automation technology.

Since worker operations power the progress of maintenance, change and exploration, any expectation of competitively advancing rests on how well workers can leverage their resource. Thus, Performance is inhibited by interruptions of that leverage, which means that performance is also constrained by the capability to manage incidents.

Inspired by that prominence, the topic of incident management stays in the foreground alongside most other considerations of competitive advantage. However, even though the above understanding is commonplace, somehow the matter of how to define, track and handle incidents falls into debate.

We think this ambiguity starts at the level on which organizations demand accountability, which in turn subjects the issue to any preferences or habits of sub-organizations given responsibility along with the accountability. Many of these preferences and habits are institutionalized in local terminology, which brings up the need for a frame of reference that allows different specializations to identify when one of them is actually talking about the same thing or working on the same thing as is another one.


One hugely important observation is that an incident is largely psycho-logical. Unlike an mere unspecified difficulty which could simply be tolerated, it has to be noticed in order to be managed (accounted for), and it has to be defined in order to be noticed.

In general, an “incident” is:

  • a detected significant departure from …
  • a condition of an activity, …
  • where that condition was defined by the expectations of a known intent.

Yet, while detection is, by definition, always involved, reporting the detection is not. There can be a very significant difference between the presence of incidents and awareness of the incidents by any parties other than the immediate directly affected party.

In general, the management distinction of “an incident” has value in terms of the prevention of an incident and/or the response to the incident.

Prevention demands a clear view of the environmental dynamics that allow or cultivate activity without incidents. Options are mainly in the design, distribution, and prerequisites of the contact between workers and their various resources.

Response requires understanding, weighing and prioritizing relevant options to regain (resolve) a systemic stability in the interactions that accompany normal uninterrupted effectiveness.  Options are:

  • Circumstantial – a recovery of superficial user progress for the moment
  • Relational – a restoration of prescribed reliability on an underlying dependency
  • Fundamental — a re-engineering of the intended permanent infrastructure

Different parties should be able to map their existing roles and scope of authority to the design/distribution/prerequisites of prevention, or likewise to the recovery/restoration/re-engineering of response.


To assure that there is a commonality of perspective across those multiple points of view, we go to the fact that the operational environment discovered and exploited by workers is to some degree found ad hoc (which creates opportunities), and to some degree commissioned by design (which creates expectations). We also recognize that the environment at hand is typically far more extensive than any worker’s practical familiarity with it. Consequently, there is a degree of uncertainty about what will occur, while there is a degree of prescription about what should occur. This is the normal condition in which incidents are found.

The following lays out a common-sense understanding of how to recognize incidents and approach them from both the proactive and responsive standpoints, simultaneously. That kind of recognition allows the organization to be working on the environment continually with the ability to systematically compare intents to actuals in a closed loop of service production, deployment and service feedback. In this view, any whole or part of a service is a logical item to be addressed, since a service may itself be composed of other services, and the exposure of any service may be specifically significant at the whole or component level.

This version of the framework predicts the possibility of 49 generic types of incidents, completely without any binding reference to a particular organizational entity or infrastructure. Example: From one moment to the next, the likelihood of an “access error” may be greater or lesser than an “output omission”, but standard guidance predicts that both can occur depending on prevention and response.

Incident Definitions Framework


The Next Normal

The next normal arrives when a new system replaces the old system in both its role and its opportunity as the preferred one to use.

A “system” occurs when a set of interacting items routinely take on a certain group behavior:  each of a critical number of elements acts, both consistently and persistently, primarily through their interactions with each other.

The routine behavior (i.e. the form) of the system occurs when the system is in a state of dynamic equilibrium, not just static configuration.

When a routine behavior consistently takes the place of a predecessor, the newer routine becomes the next “normal”.

In the next normal, new interacting patterns among the system’s internal elements are both more sustainable and more preferable than are preceding patterns.

The next normal occurs when two things happen.

One: an alternative system’s effectiveness becomes statistically predominant over an older system’s effectiveness. The difference may occur by force (causality) or by choice (attraction), leading to its potential predominance.  Impacts are the outcomes of interactions. Impacts are identified by types, not by levels. They can be forces, states, or objects. Effectiveness is the influence of the impacts.

Two: an alternative system becomes a candidate for “normal” because the compatibilities of its internal elements are more likely to persist than the incumbent system’s. They become persistent on a case-by-case basis, eventually reaching a critical mass of collective presence. The origins of the persistence may also be either authoritative or opportunistic.

Influence, however, may be circumstantial; and presence may be episodic. In both cases there must be a reason why the older system is vulnerable enough to be replaced.

An organization such as a company, a market, or an entire community can be a system… A system’s supportability is particularly sensitive to priorities. Priorities typically relate to competition, cooperation, or cohesion — the level of interaction on which changes originate. As support factors, those interactions correspond approximately to advantage, competency, and protection — the measured variables representing the priorities in the system.

Changes underlying the priorities have upstream influence.  Within a system, one’s own actions and the actions of other parties have consequences that either reinforce or undermine the priorities.

The Next Normal -Vulnerability Factors


Variations in inhibitions and encouragement alter support of the priorities; priorities support the compatibilities of system elements. Therefore, variations of the underlying factors potentially changes the equilibrium and the further predominance of the system. That change will invite a renovation of the system or deference to another (successor) system.

The most likely instigator of change is demand. The pressure of demand comes from how it amplifies some priorities at the expense of others. Then:

  • If demand alters the behavior of an organization, it may affect the equilibrium of related systems.
  • If a system becomes unstable, alternative interactions can find success and instigate rearrangement of elements within and around the originals, to favor new preferences in demand.
  • Consistent support of new interactions can mature into making the alternatives the next normal.