Managing Network Security
Risk Management or Risk Analysis?
Copyright (c), Fred Cohen, 1997
Series Introduction
Over the last several years, computing has changed to an almost purely
networked environment, but the technical aspects of information protection
have not kept up. As a result, the success of information security programs
have increasingly become a function of our ability to make prudent management
decisions about organizational activities. Managing Network Security
takes a management view of protection and seeks to reconcile the need for
security with the limitations of technology.
Huh?
As one of my co-workers once said: risk analysis ... risk management
... it's all the same thing. I tactically retreated and made strategic
plans to provide additional information later. This is later.
- Risk analysis - at least classical risk analysis - consists of (1)
gathering facts, assumptions, and estimates; and (2) making calculations
based on that information to generate results including expected loss and
the cost effectiveness of various mitigation techniques.
- Risk management - at least as it is commonly practiced - consists of
(1) gathering facts, assumptions, and estimates; and (2) making decisions
about which risks to take.
The risk analysis people may chime in here and tell me that risk analysis
is HOW risk management makes these decisions. I can only speak to this from
experience. I have been involved in many management decisions and I have
never seen anyone make a management decision relating to information protection
based solely on the result of a quantitative risk analysis. Risk analysis
may contribute to the decision process, but ultimately, that's not how decisions
are made.
So how are they made? If you were in my office, you would see me chuckle
as I wave my hands about in response. But before you rush off and figure
it's all smoke and mirrors, I'd better tell you that it's really not all
just hand waving. In fact, as you will see, hand waving plays just as much
a part in risk analysis for information protection as it does in risk management.
The real difference is that in risk analysis, the hand waving is hidden
within calculations, while in risk management, we wave our hands in front
of everyone and call it good (or bad) judgment.
Network Risk Analysis
Nobody really knows an exact way to analyze risks and risk mitigation
strategies in a networked environment, but by applying standard risk assessment
techniques, we can create a framework for analysis.
Standard risk analysis asserts that we calculate an expected loss (L)
by multiplying the probability of each event (p(e)) that can cause a loss
by the expected loss from that event (l(e)) and adding these results for
all of the events (all e in E).
Mitigation strategies are then optimized by examining each proposed mitigation
technique to derive the reduction in expected loss associated with the technique's
use, dividing by the cost of the mitigation technique to derive a return
on investment (ROI), and applying the most cost effective (i.e., the highest
ROI) method first. Apply methods until no technique with a high enough ROI
for the organization is left, and you are done.
In theory, we can apply this to a networked system of computers by simply
enumerating all of the events for the network as a whole, determining probabilities
for each event, calculating the expected loss for each event and the ROI
for each mitigation technique, and doing the arithmetic.
It seems very simple and straight forward, but there are a few challenges
along the way. Let's look at these:
- The list of events: The list of events that can cause a loss
on a single system cannot be listed exhaustively. This is one of the results
of the undecidability issues surrounding attacks. Since there are a potentially
infinite number of different attacks, listing them all is not possible.
People usually get around this by listing the ones they know about and
ignoring the rest. But of course attackers may not go along with this strategy.
They may use attacks you didn't list or attacks that didn't exist when
you made your list.
- The probability of events: Many, perhaps most, naturally occurring
events occur in a distribution that may be modeled to within a reasonable
degree of accuracy using the common methods of statistics. You might, for
example, use a random stochastic process model to assess a probability
for earthquakes of more than magnitude 5 occurring in London. But the same
cannot be said for man-made phenomena, particularly in the case of human
computer attackers. In fact, human attackers tend to act more like step
functions than Gaussian probability distributions. For this reason, the
basic mathematics of statistics are probably inappropriate for analyzing
malicious attacks on computer systems today.
- Event Independence: One of the most important bases of statistics
is the independence of events. For example, in assessing the risk of tornadoes,
we normally assume that they are independent of things such as earthquakes.
The likelihood of having both during the same time period is then computed
by multiplying their probabilities together. But attacks against computer
systems often involve multiple simultaneous events. In fact, based solely
on experience, it is far more likely that an attack will combine multiple
techniques than it is that a single technique will be applied. Trying to
assess the joint probabilities of events related in an unknown manner is
essentially impossible.
- Expected loss of events: It turns out that even getting an agreement
on the actual loss associated with an event after the event has taken place
is very hard. For example, the Morris Internet virus of 1988 had assessed
losses ranging from under $100,000 to over $100,000,000. That's a range
of more than three orders of magnitude! If we can't get within a factor
of 1,000 for events after they take place, how can we expect to get accurate
calculations of loss in advance? There is a substantial body of knowledge
on information valuation, including encyclopedic volumes on the subject
from the EDP Auditor's Association and others. Depending on how you assess
value, several orders of magnitude difference may be generated by an assessment.
- Mitigation techniques: Just as we can't exhaustively list attacks
ahead of time, we cannot exhaustively list mitigation techniques. There
are so many options available for risk reduction and elimination that nobody
knows about all of them.
- Reduction in expected loss: Even if we could list every possible
risk reduction technique, in order to assess the cost effectiveness of
these techniques, we need a figure for reduction in expected loss. Unfortunately,
nobody has ever come up with a valid way to do this. Consider, for example,
the reduction in expected loss involved in moving from a system requiring
at least 15 characters for a password to a system requiring at least 16
characters. How do we compute the effect on expected loss? It's not 26
times harder to guess a 16 character password than a 15 character password,
but even if it were, the difference in password guessability is not the
only factor involved. Passwords may have to be written down more frequently,
or perhaps they are even more likely to be stored in computers that automatically
contact other computers rather than remembered by the user. There will
be more mistakes in password entry, thus increasing the day-to-day investment
associated with the longer passwords. This defense will not be effective
against attackers who tap into communications or who exploit other vulnerabilities.
And the effect is not independent of other effects from other techniques
- such as using cryptography to protect the information residing within
the system. Nobody has ever come up with a viable argument for associating
a particular reduction in expected loss to a particular defensive technique.
- Sensitivity: It turns out that risk analysis can be very sensitive
to details. For example, a small change in probability or expected loss
may cause the choice of one technique over another. Once that technique
is selected, it might have cascading effects on subsequent decisions because
of the way it changes the mitigating effects of other techniques. I have
not seen risk assessment done with sensitivity analysis built in, however,
those who have tried using intervals instead of fixed values to address
this issue have found that the analysis becomes far more complex.
- Exponentiate for networks: All of these issues in risk assessment
apply to individual systems, but when we go to networks, things get far
more complex. Each computer in a computer network might contain or process
different information with different value, might be subjected to different
attacks, and might have different defenses. If we assume they are independent,
we miss catastrophic events that impact the entire network, but if we try
to calculate all combinations of events and their impacts on all combinations
of computers, we run into a substantial combinatorics problem. Furthermore,
networking introduces new classes of events, different sorts of losses,
may dramatically change the expected loss reduction associated with different
mitigation techniques, and that's just the beginning.
Nobody to my knowledge has ever performed a full risk analysis of a substantial
network, and I doubt that anyone ever will. People that claim to do network
risk analysis tend to make sweeping assumptions.
Having said all of this, I am anxious to add a note of caution. There
are a substantial number of people who believe that quantitative risk analysis
is viable in information protection and who perform this analysis with great
rigor. Quantitative risk analysis has been applied to systems and networks
of all sizes for many years. Those who believe in it have not perished and
they are firm in their convictions. The supporters of this technique are
well aware of all of the points I have made here, and they assert that they
can still do a good job despite these factors.
Network Risk Management
Risk management takes a completely different perspective on the issues
of risk. The basic idea is that everything in life is risky. You win some
and you lose some. The object is to make the wins bigger than the losses.
Instead of trying to micro-manage technical protection, risk management
seeks to make decisions about whether and when to take, avoid, or mitigate
risks and how much to spend in the process.
For the risk manager, the range of possibilities goes from not worth
worrying about to we lose everything if it fails. The way risk
managers decide what sits where in this spectrum is through an understanding
of the nature of the enterprise and the role of the particular component
in the success or failure of the enterprise.
An astute reader might exclaim that this is the same thing that the risk
analysis process provides in its assessment of expected loss, except that
risk analysis has more rigor. In an ideal world, that would be true. Unfortunately,
it is very hard to encapsulate business sense in expected loss numbers.
As a partial solution, some people have tried to encapsulate business knowledge
in their risk analysis methodologies, and this has had a positive effect
on overall protection management. But I am getting ahead of myself.
Just as risk analysts have a hard time encoding business sense into their
risk analysis, it is also very hard to get many managers to understand the
risks associated with the application of information technology in their
enterprise. Indeed, there is widespread belief among many people in the
near-infallibility of computers. After all, if it weren't for our imperfect
human programmers, computers would always be right.
One of the side effects of not understanding risks and a general belief
in the perfection of computers is the general perception that a computer
system is right unless there is some reason to believe that it is wrong.
By extension, many people seem to believe that unless a computer displays
some indication of having been attacked, it must be secure. In systems that
go a step further and proclaim that they are operating in a secure mode
(e.g., Netscape which displays a key when it is using cryptographic communication
and a broken key when it is not) users tend to believe that this mode is
indeed secure against all threats.
Risk managers are people too, and they sometimes fall prey to the same
misunderstandings as the general public. The real weakness in risk management
is that it is often done by people who don't understand enough about the
risks they face
A fairly common technique in risk management is the covering approach.
In the covering approach, we create a list of attacks and a list of defenses
and identify which defense provides coverage against which attack. The goal
of risk management is then to balance coverage with organizational importance.
Importance is determined based on managers' assessments of what they are
worried about, while coverage is characterized by a qualitative statement
about the strength or nature of the coverage against each of the attacks
it applies to. In the coverage approach, costs are sometimes identified
with defenses, but it is rare that expected loss is associated with attacks.
Rather, managers reason about what they are willing to protect against and
what they are not willing to protect against, hopefully considering facts
of many kinds from many sources.
Perhaps the most important point to be made for the covering approach
is that management makes explicit decisions about what attacks are to be
covered and how much depth the defense has against each type of attack.
In the same process, they explicitly and knowingly make decisions to not
cover particular attacks or to protect against them with weak or non-redundant
coverage. It is the manager's job to understand the impact of attacks on
the organization and to make decisions over time. Managers can then ask
themselves questions like:
What can reasonably be ignored for now? What decisions can reasonably
be delayed? What can be managed if and when it occurs rather than protected
against proactively? What sorts of contingency planning should be considered
over time? What can I insure against? What expertise should I bring into
the organization to help in these areas? How will this impact other operational
decisions?
Unlike the purely numerical prescriptions given by quantitative risk
analysis, the covering approach leaves a great deal of management latitude
and involves judgments other than the association of numbers to events.
The covering approach also gives managers some things that classical risk
analysis doesn't give them:
- Knowledge about where they are placing their bets and why: Unlike numerical
methods which lead to knowledge in a form like: the return on investment
for defense A is better than for defense B by 27.3 percent; the covering
approach leads to knowledge in a form like: defense B will cover these
three things that I think are very important and these two things that
I think are less important and will only cost me this much, while defense
A is expensive and I think that we can manage our way out of trouble with
some temporary emergency measures if the things defense A protects against
ever came to pass.
- The ability to decide what to protect by proactive as opposed to reactive
means: I have never seen a risk analysis that considered the range of proactive
and reactive approaches to protection - and I doubt that I ever will. For
one thing, it's very hard to list all of the ideas you might be able to
come up with when necessity drives the birth of a new invention. For another
thing, I know of no way to quantify most reactive approaches in terms that
risk analysis can compute.
- The ability to make judgments about what can and cannot be managed
as it comes up: I am unaware of a computer program that can make a reasonable
judgment about what a manager can manage effectively as it comes up. On
the other hand, almost every manager I have ever met seems to have a sense
about what they could and could not handle if it came up. It also tends
to be different for different managers in different organizations.
- The ability to use their organizational understanding and judgment
in making decisions: A typical risk management decision will include factors
like the quality of our telecommunications staff, the way we stayed in
business eight years ago during the summer power outages could be used
to compensate for that kind of an event, and Jerry over in accounting will
find a way to get those checks out no matter what you do to the inventory
control system's computers.
- The ability to choose methods that are more well suited to the personality
of the organization and the available human resources: It is common to
find techniques that are 25 percent less cost effective on paper but that
work three times as well within a particular organization. Risk analysis
normally doesn't consider the human impacts of technological choices, while
good managers almost always do.
- The ability to use the knowledge gained in order to plan an organizational
future that will lead to long-term improvements: The risk analysis program
doesn't produce more knowledgeable managers who have a deeper understanding
of the issues underlying information protection. It may be that the benefits
of this side effect far outweigh any of the benefits of protection decisions
made by numerical analysis.
Along with the benefits of risk management approaches like the covering
approach, come some potential down sides:
- Decision makers need to understand the implications of their decisions:
If decision makers don't invest the time and effort to do this, they will
make poor decisions and the organization will suffer as a result. They
will not have a computer program or technical expert to blame it on.
- Lack of quantification may lead to poor decisions in some cases: People
often make judgments that, in hindsight, are poor. In many cases these
are the result of inadequate or incorrect assessment of relative values.
This is especially true in networked environments where multipliers such
as the effects on interconnected systems may be subtle or hard to judge
without quantification.
- Risk management places a lot of burden on management awareness and
consideration: If decision makers are not kept up-to-date on the changing
information environment, if they don't personally and deeply consider the
underlying issues involved in risk management decisions, or if they use
organizational dynamics to allow decisions to be made for them, the results
may be poor.
Summary
You may be surprised to find out this late in my article that I believe
that properly done risk analysis and properly done risk management both
work well - but in different contexts.
In large organizations with well-qualified managers, strong technical
support staffs, and a high degree of awareness and technical sophistication,
risk management has proven highly effective. In organizations with weaker
technical staff, less awareness, and less technical orientation, risk analysis
provides a viable methods for making reasonable decisions.
But these rules-of-thumb aside, there is one key factor in getting good
results from either technique. It is the quality of the individuals who
help guide the risk analysis/management process. It may sound like an easy
out, but it's invariably true - regardless of the techniques you choose,
more experienced people with more knowledge and expertise get better results
- and people with less experience, less knowledge, or less expertise get
poorer results. Perhaps the most important risk management or risk analysis
decision to be made is who is put in charge of the process.
About The Author
Fred Cohen is a Senior Member of Technical Staff at Sandia National Laboratories
and a Senior Partner of Fred Cohen and Associates in Livermore California,
an executive consulting and education group specializing information protection.
He can be reached by sending email to fred at all.net.