Managing Network Security

Risk Management or Risk Analysis?

Series Introduction

Over the last several years, computing has changed to an almost purely networked environment, but the technical aspects of information protection have not kept up. As a result, the success of information security programs have increasingly become a function of our ability to make prudent management decisions about organizational activities. Managing Network Security takes a management view of protection and seeks to reconcile the need for security with the limitations of technology.

Huh?

As one of my co-workers once said: risk analysis ... risk management ... it's all the same thing. I tactically retreated and made strategic plans to provide additional information later. This is later.

Risk analysis - at least classical risk analysis - consists of (1) gathering facts, assumptions, and estimates; and (2) making calculations based on that information to generate results including expected loss and the cost effectiveness of various mitigation techniques.
Risk management - at least as it is commonly practiced - consists of (1) gathering facts, assumptions, and estimates; and (2) making decisions about which risks to take.

The risk analysis people may chime in here and tell me that risk analysis is HOW risk management makes these decisions. I can only speak to this from experience. I have been involved in many management decisions and I have never seen anyone make a management decision relating to information protection based solely on the result of a quantitative risk analysis. Risk analysis may contribute to the decision process, but ultimately, that's not how decisions are made.

So how are they made? If you were in my office, you would see me chuckle as I wave my hands about in response. But before you rush off and figure it's all smoke and mirrors, I'd better tell you that it's really not all just hand waving. In fact, as you will see, hand waving plays just as much a part in risk analysis for information protection as it does in risk management. The real difference is that in risk analysis, the hand waving is hidden within calculations, while in risk management, we wave our hands in front of everyone and call it good (or bad) judgment.

Network Risk Analysis

Nobody really knows an exact way to analyze risks and risk mitigation strategies in a networked environment, but by applying standard risk assessment techniques, we can create a framework for analysis.

Standard risk analysis asserts that we calculate an expected loss (L) by multiplying the probability of each event (p(e)) that can cause a loss by the expected loss from that event (l(e)) and adding these results for all of the events (all e in E).

Mitigation strategies are then optimized by examining each proposed mitigation technique to derive the reduction in expected loss associated with the technique's use, dividing by the cost of the mitigation technique to derive a return on investment (ROI), and applying the most cost effective (i.e., the highest ROI) method first. Apply methods until no technique with a high enough ROI for the organization is left, and you are done.

In theory, we can apply this to a networked system of computers by simply enumerating all of the events for the network as a whole, determining probabilities for each event, calculating the expected loss for each event and the ROI for each mitigation technique, and doing the arithmetic.

It seems very simple and straight forward, but there are a few challenges along the way. Let's look at these:

The list of events: The list of events that can cause a loss on a single system cannot be listed exhaustively. This is one of the results of the undecidability issues surrounding attacks. Since there are a potentially infinite number of different attacks, listing them all is not possible. People usually get around this by listing the ones they know about and ignoring the rest. But of course attackers may not go along with this strategy. They may use attacks you didn't list or attacks that didn't exist when you made your list.
The probability of events: Many, perhaps most, naturally occurring events occur in a distribution that may be modeled to within a reasonable degree of accuracy using the common methods of statistics. You might, for example, use a random stochastic process model to assess a probability for earthquakes of more than magnitude 5 occurring in London. But the same cannot be said for man-made phenomena, particularly in the case of human computer attackers. In fact, human attackers tend to act more like step functions than Gaussian probability distributions. For this reason, the basic mathematics of statistics are probably inappropriate for analyzing malicious attacks on computer systems today.
Event Independence: One of the most important bases of statistics is the independence of events. For example, in assessing the risk of tornadoes, we normally assume that they are independent of things such as earthquakes. The likelihood of having both during the same time period is then computed by multiplying their probabilities together. But attacks against computer systems often involve multiple simultaneous events. In fact, based solely on experience, it is far more likely that an attack will combine multiple techniques than it is that a single technique will be applied. Trying to assess the joint probabilities of events related in an unknown manner is essentially impossible.
Expected loss of events: It turns out that even getting an agreement on the actual loss associated with an event after the event has taken place is very hard. For example, the Morris Internet virus of 1988 had assessed losses ranging from under $100,000 to over $100,000,000. That's a range of more than three orders of magnitude! If we can't get within a factor of 1,000 for events after they take place, how can we expect to get accurate calculations of loss in advance? There is a substantial body of knowledge on information valuation, including encyclopedic volumes on the subject from the EDP Auditor's Association and others. Depending on how you assess value, several orders of magnitude difference may be generated by an assessment.
Mitigation techniques: Just as we can't exhaustively list attacks ahead of time, we cannot exhaustively list mitigation techniques. There are so many options available for risk reduction and elimination that nobody knows about all of them.
Reduction in expected loss: Even if we could list every possible risk reduction technique, in order to assess the cost effectiveness of these techniques, we need a figure for reduction in expected loss. Unfortunately, nobody has ever come up with a valid way to do this. Consider, for example, the reduction in expected loss involved in moving from a system requiring at least 15 characters for a password to a system requiring at least 16 characters. How do we compute the effect on expected loss? It's not 26 times harder to guess a 16 character password than a 15 character password, but even if it were, the difference in password guessability is not the only factor involved. Passwords may have to be written down more frequently, or perhaps they are even more likely to be stored in computers that automatically contact other computers rather than remembered by the user. There will be more mistakes in password entry, thus increasing the day-to-day investment associated with the longer passwords. This defense will not be effective against attackers who tap into communications or who exploit other vulnerabilities. And the effect is not independent of other effects from other techniques - such as using cryptography to protect the information residing within the system. Nobody has ever come up with a viable argument for associating a particular reduction in expected loss to a particular defensive technique.
Sensitivity: It turns out that risk analysis can be very sensitive to details. For example, a small change in probability or expected loss may cause the choice of one technique over another. Once that technique is selected, it might have cascading effects on subsequent decisions because of the way it changes the mitigating effects of other techniques. I have not seen risk assessment done with sensitivity analysis built in, however, those who have tried using intervals instead of fixed values to address this issue have found that the analysis becomes far more complex.
Exponentiate for networks: All of these issues in risk assessment apply to individual systems, but when we go to networks, things get far more complex. Each computer in a computer network might contain or process different information with different value, might be subjected to different attacks, and might have different defenses. If we assume they are independent, we miss catastrophic events that impact the entire network, but if we try to calculate all combinations of events and their impacts on all combinations of computers, we run into a substantial combinatorics problem. Furthermore, networking introduces new classes of events, different sorts of losses, may dramatically change the expected loss reduction associated with different mitigation techniques, and that's just the beginning.

Nobody to my knowledge has ever performed a full risk analysis of a substantial network, and I doubt that anyone ever will. People that claim to do network risk analysis tend to make sweeping assumptions.

Having said all of this, I am anxious to add a note of caution. There are a substantial number of people who believe that quantitative risk analysis is viable in information protection and who perform this analysis with great rigor. Quantitative risk analysis has been applied to systems and networks of all sizes for many years. Those who believe in it have not perished and they are firm in their convictions. The supporters of this technique are well aware of all of the points I have made here, and they assert that they can still do a good job despite these factors.

Network Risk Management

Risk management takes a completely different perspective on the issues of risk. The basic idea is that everything in life is risky. You win some and you lose some. The object is to make the wins bigger than the losses. Instead of trying to micro-manage technical protection, risk management seeks to make decisions about whether and when to take, avoid, or mitigate risks and how much to spend in the process.

For the risk manager, the range of possibilities goes from not worth worrying about to we lose everything if it fails. The way risk managers decide what sits where in this spectrum is through an understanding of the nature of the enterprise and the role of the particular component in the success or failure of the enterprise.

An astute reader might exclaim that this is the same thing that the risk analysis process provides in its assessment of expected loss, except that risk analysis has more rigor. In an ideal world, that would be true. Unfortunately, it is very hard to encapsulate business sense in expected loss numbers. As a partial solution, some people have tried to encapsulate business knowledge in their risk analysis methodologies, and this has had a positive effect on overall protection management. But I am getting ahead of myself.

Just as risk analysts have a hard time encoding business sense into their risk analysis, it is also very hard to get many managers to understand the risks associated with the application of information technology in their enterprise. Indeed, there is widespread belief among many people in the near-infallibility of computers. After all, if it weren't for our imperfect human programmers, computers would always be right.

One of the side effects of not understanding risks and a general belief in the perfection of computers is the general perception that a computer system is right unless there is some reason to believe that it is wrong. By extension, many people seem to believe that unless a computer displays some indication of having been attacked, it must be secure. In systems that go a step further and proclaim that they are operating in a secure mode (e.g., Netscape which displays a key when it is using cryptographic communication and a broken key when it is not) users tend to believe that this mode is indeed secure against all threats.

Risk managers are people too, and they sometimes fall prey to the same misunderstandings as the general public. The real weakness in risk management is that it is often done by people who don't understand enough about the risks they face

A fairly common technique in risk management is the covering approach. In the covering approach, we create a list of attacks and a list of defenses and identify which defense provides coverage against which attack. The goal of risk management is then to balance coverage with organizational importance. Importance is determined based on managers' assessments of what they are worried about, while coverage is characterized by a qualitative statement about the strength or nature of the coverage against each of the attacks it applies to. In the coverage approach, costs are sometimes identified with defenses, but it is rare that expected loss is associated with attacks. Rather, managers reason about what they are willing to protect against and what they are not willing to protect against, hopefully considering facts of many kinds from many sources.

Perhaps the most important point to be made for the covering approach is that management makes explicit decisions about what attacks are to be covered and how much depth the defense has against each type of attack. In the same process, they explicitly and knowingly make decisions to not cover particular attacks or to protect against them with weak or non-redundant coverage. It is the manager's job to understand the impact of attacks on the organization and to make decisions over time. Managers can then ask themselves questions like:

What can reasonably be ignored for now? What decisions can reasonably be delayed? What can be managed if and when it occurs rather than protected against proactively? What sorts of contingency planning should be considered over time? What can I insure against? What expertise should I bring into the organization to help in these areas? How will this impact other operational decisions?

Unlike the purely numerical prescriptions given by quantitative risk analysis, the covering approach leaves a great deal of management latitude and involves judgments other than the association of numbers to events. The covering approach also gives managers some things that classical risk analysis doesn't give them:

Knowledge about where they are placing their bets and why: Unlike numerical methods which lead to knowledge in a form like: the return on investment for defense A is better than for defense B by 27.3 percent; the covering approach leads to knowledge in a form like: defense B will cover these three things that I think are very important and these two things that I think are less important and will only cost me this much, while defense A is expensive and I think that we can manage our way out of trouble with some temporary emergency measures if the things defense A protects against ever came to pass.
The ability to decide what to protect by proactive as opposed to reactive means: I have never seen a risk analysis that considered the range of proactive and reactive approaches to protection - and I doubt that I ever will. For one thing, it's very hard to list all of the ideas you might be able to come up with when necessity drives the birth of a new invention. For another thing, I know of no way to quantify most reactive approaches in terms that risk analysis can compute.
The ability to make judgments about what can and cannot be managed as it comes up: I am unaware of a computer program that can make a reasonable judgment about what a manager can manage effectively as it comes up. On the other hand, almost every manager I have ever met seems to have a sense about what they could and could not handle if it came up. It also tends to be different for different managers in different organizations.
The ability to use their organizational understanding and judgment in making decisions: A typical risk management decision will include factors like the quality of our telecommunications staff, the way we stayed in business eight years ago during the summer power outages could be used to compensate for that kind of an event, and Jerry over in accounting will find a way to get those checks out no matter what you do to the inventory control system's computers.
The ability to choose methods that are more well suited to the personality of the organization and the available human resources: It is common to find techniques that are 25 percent less cost effective on paper but that work three times as well within a particular organization. Risk analysis normally doesn't consider the human impacts of technological choices, while good managers almost always do.
The ability to use the knowledge gained in order to plan an organizational future that will lead to long-term improvements: The risk analysis program doesn't produce more knowledgeable managers who have a deeper understanding of the issues underlying information protection. It may be that the benefits of this side effect far outweigh any of the benefits of protection decisions made by numerical analysis.

Along with the benefits of risk management approaches like the covering approach, come some potential down sides:

Decision makers need to understand the implications of their decisions: If decision makers don't invest the time and effort to do this, they will make poor decisions and the organization will suffer as a result. They will not have a computer program or technical expert to blame it on.
Lack of quantification may lead to poor decisions in some cases: People often make judgments that, in hindsight, are poor. In many cases these are the result of inadequate or incorrect assessment of relative values. This is especially true in networked environments where multipliers such as the effects on interconnected systems may be subtle or hard to judge without quantification.
Risk management places a lot of burden on management awareness and consideration: If decision makers are not kept up-to-date on the changing information environment, if they don't personally and deeply consider the underlying issues involved in risk management decisions, or if they use organizational dynamics to allow decisions to be made for them, the results may be poor.

Summary

You may be surprised to find out this late in my article that I believe that properly done risk analysis and properly done risk management both work well - but in different contexts.

In large organizations with well-qualified managers, strong technical support staffs, and a high degree of awareness and technical sophistication, risk management has proven highly effective. In organizations with weaker technical staff, less awareness, and less technical orientation, risk analysis provides a viable methods for making reasonable decisions.

But these rules-of-thumb aside, there is one key factor in getting good results from either technique. It is the quality of the individuals who help guide the risk analysis/management process. It may sound like an easy out, but it's invariably true - regardless of the techniques you choose, more experienced people with more knowledge and expertise get better results - and people with less experience, less knowledge, or less expertise get poorer results. Perhaps the most important risk management or risk analysis decision to be made is who is put in charge of the process.

About The Author

Fred Cohen is a Senior Member of Technical Staff at Sandia National Laboratories and a Senior Partner of Fred Cohen and Associates in Livermore California, an executive consulting and education group specializing information protection. He can be reached by sending email to fred at all.net.