Internet Holes

Internet Incident Response

Series Introduction

The Internet is now the world's most popular network and it is full of potential vulnerabilities. In this series of articles, we explore the vulnerabilities of the Internet and what you can do to mitigate them.

Background

In a previous article, I discussed the zero-tolerance approach to incident response. Since then I have heard from a substantial number of people who expressed a wide range of different feelings about this approach. Most of the opinions came in one of two forms. Those who agreed with the approach sent email to fred at all.net expressing their support of this position. With a few rare exceptions, those who opposed the approach expressed their opposition by attempting to attack our Internet site.

As a side effect of these opposition expressions, the all.net site helped detect break-ins at scores of Internet sites including at least ten Internet service providers. It was also an opportunity to exersize and hone our incident response capabilities.

Several people expressed an interest in the incident response capabilities at all.net as well as in the more general issue of what incident response capabilities are most appropriate for systems connected to the Internet. That's what this article is about.

What is an Incident?

I'll take the broadest possible interpretation. An incident is anything that happens to your information or information technology that is not desirable. This can range from someone forgetting their password to a fire in the microfilm room to a misfiled file-folder to a typo during data entry. Since we are concentrating on the Internet, our main concern is incidents involving the Internet.

From a practical point of view, detected incidents are the things we respond to, which brings up a critical question in incident response: How do we detect incidents?

I won't answer that question here, but it turns out that Internet sites with high-quality detection capabilities detect 100 times as many incidents as sites with poor detection capabilities. As a benchmark, these same sites report an average of more than one incident per day. If you detect far fewer incidents, especially if you are in charge of the Internet link at a large company, it might mean that your detection capabilities are less than ideal. As an example, at the all.net site, the following was a typical week's worth of audit trails reflective of incidents coming from the Internet:

\vbox{\small

}

\vbox{\small

~ ~ ~ ~ ~ ~ Some solicitation to an invalid UID at this site.

}

\vbox{\small

~ ~ ~ ~ ~ ~ More mail to a non-existent user (1st 3 lines abbreviated for readability)

May 14 19:34:17 all in.ftpd[25875]: refused connect from 199.45.127.109
~ ~ ~ ~ ~ ~ We generally ignore this activity unless multiple attempts take place

May 15 09:28:20 all in.ftpd[10179]: refused connect from ras221.if.scientech.com
~ ~ ~ ~ ~ ~ We generally ignore this activity unless multiple attempts take place

}

\vbox{\small

~ ~ ~ ~ ~ ~ More mail to a non-existent user.

ManAlMail Wed May 15 11:36:25 EDT 1996 all.net 21483 grc-ny11-16.ix.netcom.com mconn876
~ ~ 502Q - Command not implemented:
ManAlMail Wed May 15 11:36:28 EDT 1996 all.net 21483 grc-ny11-16.ix.netcom.com mconn876
~ ~ 502Q - Command not implemented:
ManAlMail Wed May 15 11:36:30 EDT 1996 all.net 21483 grc-ny11-16.ix.netcom.com mconn876
~ ~ 502Q - Command not implemented:
ManAlMail Wed May 15 11:36:32 EDT 1996 all.net 21483 grc-ny11-16.ix.netcom.com mconn876
~ ~ 502Q - Command not implemented:
ManAlMail Wed May 15 11:36:37 EDT 1996 all.net 21483 grc-ny11-16.ix.netcom.com mconn876
~ ~ 502Q - Command not implemented: ls
ManAlMail Wed May 15 11:36:47 EDT 1996 all.net 21483 grc-ny11-16.ix.netcom.com mconn876
~ ~ 502Q - Command not implemented:
ManAlMail Wed May 15 11:36:52 EDT 1996 all.net 21483 grc-ny11-16.ix.netcom.com mconn876
~ ~ 502Q - Command not implemented:
ManAlMail Wed May 15 11:37:05 EDT 1996 all.net 21483 grc-ny11-16.ix.netcom.com mconn876
~ ~ 502Q - Command not implemented: /close
~ ~ ~ ~ ~ ~ This looks like an attempt to abuse our mail reception system to issue
~ ~ ~ ~ ~ ~ Unix commands.

}

\vbox{\small

}

These audit records were automatically extracted from normal audit records and represent less than 0.25during this time frame. The extraction process is a relatively simple one. We have a list of audit records that we consider to be normal, and anything that is not in that list represents a possible incident. We also have a time-based system that detects unusual activity levels and several other tricks, but they are not relevant to the discussion at hand.

The key here is to make a finding about what is normal and acceptable and to trigger detection on anything else. By contrast, a method that looks for known anomalous behaviors will miss new attacks and unknown behaviors. There is an ongoing tuning process that eliminates more and more false positives over time, and as you can see from the comments above, some behaviors that are not really authorized are tolerated when they happen in isolation.

As a rule, most people handling incident response group similar events together and call a set of related events a single incident. This is generally because the events are thought to be from the same cause. For example, a single attempted login from an unauthorized remote site might be treated as an incident. If several subsequent attempts at entry come from the same remote site over a short period of time, they will usually be grouped together and treated as one larger and more serious incident. Savvy administrators develop special tools to cross-correlate events so that they don't have to manually try to figure out which events are related.

The tool we developed cross correlates by IP address, by Class C, B, and A network, by time, and by other similar factors. We run it periodically on both current and historical data to catch long-term attack patterns, and our current longest term attack detection was from the correlation of related data over a period of six months. In this case we found an attacker who revisited us on a regular basis attempting different methods of attack at a pace apparently designed to avoid shorter range detection methods.

Internet Incident Response Resources

Let's now assume that you have detected an incident. The next step is your response. Response strategies can range from automated instantaneous zero-tolerance responses such as those used at all.net to simply noting the attempt and trying to make certain that it did not succeed, or if it did succeed, limiting damage, recovering from the attack, and reducing or eliminating future exploitation of the same or similar vulnerabilities.

Having said this, I should note that many sites don't do even the second sort of response. My view is that if you don't do at least the second sort of response, you are not responding to an incident, but simply observing it. The chances are, if you do less than this, you don't spend the time required to tune your detection system, you don't pay very much attention to the results of whatever detection you have in place, and you only really respond when a user tells you the network is down.

Anything in the range describe above might be appropriate depending on the environment. For example, some sites are barred by law or organizational policy from any response that results in outbound traffic. Military sites that actively respond to attempted entry over international boundaries might be accused of escalation. Banks might not want to let anyone know how successful an attack has been, and this might include the administrator of the attacking site. Law enforcement might not want to alert attackers that they have been detected, and honey pots might be designed to gather information on attack techniques without scaring off the attackers.

Depending on the response strategy you decide to use, you will have different resources available to help you respond. If you don't want anyone to know that an incident has taken place, you cannot rely on outside resources except for informational value. If you don't want anyone to know about the attack, you should get response information by using access points that cannot be traced to your organization. These might include personal accounts created for your staff members using local Internet Service Providers, accounts in subsidiary or parent companies, accounts owned by security consultants that you trust, or anonymous accounts available from local free-nets, computer clubs, and similar sources. The less information you are willing to give, the less assistance you are likely to receive. This is partly because some sources of help only respond to specific incidents when detailed reports are made, and partly because it's very hard to be helpful without knowing many of the details you might not want revealed.

If even these methods reveal more than is desirable, you may have to simply disconnect until you can defend yourself properly and create an appropriate cover story in case anyone notices that your site was inaccessible. It's not unusual to hear about sites that have a power failure, a cable cut, or a telephone outage, and to the extent that it's important to conceal the attack, these are good cover stories. They don't, of course, provide you with any assistance in responding to the incident.

If you are willing to use the Internet to get assistance in a response, there are some limited resources available. For example, there are Computer Emergency Response Teams (CERTs) in the United States, Australia, and many other countries.

The name is actually a misnomer for the US team, because they don't actually do anything in response other than to help you use the tools you should already know how to use and help you contact people using databases you already have access to. The US team takes reports of incidents and uses them for their own undisclosed purposes (perhaps to get more funding). They also send you documentation on some of the free tools available from the Internet to help secure your system for the next time and give you free advice that is worth exactly what you pay for it.

The Australian team, on the other hand, seems very helpful in responding to incidents. They have, on occasion, found out who owned a system actively participating in an attack and called them in the middle of the night to have them take care of the problem. They also follow up until they find the cause and eliminate it. They have even been known to assist in the prosecution of perpetrators. If the attack involved Australia, they are a very good place to start.

Another more sophisticated and helpful team of defenders forms the FIRST group. FIRST stands for Forum of Incident ReSponse Teams, and is comprised of teams of incident response experts from all over the world. These teams cooperate to track down attacks when they involve team members or subscribers. If you are a subscriber, this is probably your most important resource.

For those of us who aren't members of global response teams, there are a number of Internet mailing lists where assistance can be gained in some cases. Vendor-operated mailing lists and email support services are a good place to look if your situation is vendor-related. Other forums include the bugtraq mailing list for those who find security bugs in Unix systems, the firewalls mailing list for problems related to firewalls, the Risks forum for issues related to the broader category of computer-related risks, and forums attached to professional societies such as the ISACA. But remember, anything you post to these forums is widely disseminated to the general public, the news media, and to computer criminals of various sorts.

Last, and probably least, there are the Internet search engines such as AltaVista where, in a short period of time, you might find hundreds of pointers to articles on related subjects. This is not a very efficient way to track down specifics in most cases but sometimes it is very helpful. In one case, I used the Web to track down information on Warez sites when my site started receiving thousands of requests in a few hours by people trying to enter my site looking for things with that common label. The Web helped me understand the nature of the attempted entries and to prepare an appropriate response. In this particular case, they were looking for a repository for illegal software copies. I responded by implementing a message indicating that we neither stored nor supported the storage of such things and that their user ID and site name had been forwarded to the Software Publishers Association (SPA), a trade organization that tracks down and helps prosecute those who have illegal copies of software. The attempts quickly subsided and the SPA got a lot of valuable data on where to look for illegal copies of software. We also sent email to the systems administrators of attacking sites indicating that their users were apparently at our site looking for this software. This response generated several answers from systems administrators thanking us for helping them enforce their organizational policy against illegal software use.

Tools for Incident Response

Incident response tools generally come in two different sorts. There are the internally directed tools intended to help you prevent, detect, limit, and recover from attacks, and there are the externally directed tools intended to help you track down the source of the attack and get the perpetrator to stop attacking.

The most important internal tools are network-based access controls such as those in typical firewalls that allow you to stop the attack, and audit trails, cryptographic checksums and similar change-detection systems that allow you to determine how successful the attack was. If these tools are in place before the attack, you should be able to respond quickly and effectively in almost all cases. If they are not, it may well be too late to use them, and your only solution may be to disconnect from the Internet entirely while you recover.

If you are planning on tracking down attackers, you will need a different set of tools. The most important tools are a few commonly available programs that allow you to find the IP address of the attacker, trace the IP address to a host, identify the organization that owns or controls that host, and locate the responsible party at that site. If you get only one incident per week, you may want gather these tools and write down a sequence of steps to take in tracking down each attack individually. If you, like most Internet sites, get one or more incidents per day, it will probably be worth automating certain parts of the process with a language such as Perl. The tools I gathered over time included:

As a first step, I turned on auditing through the use of TCP wrappers to assure that I had records of virtually all attempts to use any service on my computer. These results were recorded in a file for analysis.
ip2name and name2ip - programs that translate from an IP address to a hostname and from a hostname to an IP address. I wrote my own versions of these programs over time, but before that I used slightly modified versions of commonly available tools such as ISS (the Internet Security Scanner) and Traceroute (described below) to perform this function.
With an IP address, you can determine whether the host is in a Class A, Class B, Class C, or CIDR address group, and with this information, you can contact the InterNIC electronically and ask for details on the ownership, location, contact points, and name of the network the computer belongs to. This can be achieved manually by telnetting into internic.net, by using the Web to access their web site, or by a small program that uses the whois protocol to get the information in more easily computer usable form. The whois program is available on many systems to perform this task, and if it is not available, it is easy to create by simply telnetting to the whois port on the InterNIC machine. I used netcat to automate this function in my system.
In many cases, you can't get in touch with the owner of the site causing the attack, and in still more cases, you can't get in touch with them in a short time frame. If this happens, you may decide to trace the route by which the attack might have come to your site and contact intermediate sites that are the Internet Service Providers for the sites causing the problem. The first step in doing this is to use the traceroute program. Traceroute uses the time to live field of an IP packet to trace a route back to another Internet site. This is done by first using a time to live value of 1, then 2, and so on. As each site gets the packet, it reduces the time to live value by 1. If the value reaches 0, it returns an error packet. The traceroute program keeps track of these error returns to get a list of intermediate nodes.
Once you have a list of intermediate IP nodes, you can get details on each site along the path between the attacker. By starting at the node closest to the attacker, you stand the best chance of reaching someone in authority to track down and stop future attacks. For each node, you have to repeat the process described above for determining and contacting the responsible party.

Automating this process allows defenders to track down and mail to relevant sites without expending significant effort. For example, the automation we put in place allowed one person to respond to several thousand incidents in a day without undue difficulty. Most of the real effort came in interacting with remote administrators.

The Human Side of Response

If you spend some time and effort trying out the available tools for Internet incident response, you will find that they are a bit clunky but usable. The real problem you will face will come from the human side of the Internet.

The history of Internet security is one of human-to-human contact between systems administrators via email. The reasonable person principle was applied, and the people were reasonable, so the network worked. Unfortunately, this situation has changed, due in large part to the enormous growth of the Internet. While Ph.D. computer science researchers can usually work out their differences regarding inter-site malicious behavior, the general population often cannot.

There are several underlying issues including, but not limited to, cultural differences, time differences, language differences, and age differences.

Cultural differences show up between different countries and subcultures within those countries. For example, systems administrators in universities in the Netherlands seem to think that it's perfectly acceptable for their users to attack other sites and will do little to prevent it. Against this attitude, I threaten to cut off all access from their site unless they were willing to listen to my complaints, and the response is generally to deal with the complaints rather than restrict their users. By contrast, Singapore has such severe laws against certain kinds of computer use that a foreigner may be liable for making information on computer security available to their citizens, and of course the citizen may be punished for partaking of that information.

Time differences often mean that systems administrators are not available until the next day at a site across the globe. A daytime attack in the United States is being launched in the middle of the night from New Zealand. When you call the site, you will probably not get a live operator. Email exchanges commonly end up with a 24-hour cycle-time in this circumstance unless one site or the other operates 24 hours a day.

Language differences are not as bad for English speaking people as for everyone else because the Internet is predominantly an English language venue. Just as English is used in air traffic around the world, it is by far the most common language used in Internet traffic. But just because most administrators know some English doesn't mean they are masters of the language. Phrasing is often confusing and subtleties are quickly lost.

Age differences are most often a problem in dealings between adolescents and adults. The young explorer has a hard time understanding the corporate viewpoint and the corporate computer person may be unable to relate with the young explorer investigating new territory in the corporate network.

In the end, I seem to encounter three major types of responses. The bulldog (I'm one of them), the self-righteous indignant, and the hopelessly uncaring. The bulldog wants to track everything down and will do whatever they reasonably can to catch and punish perpetrators. The self-righteous indignant claims the right to do anything they want to do but normally doesn't grant the same right to others who may have a different viewpoint. The hopelessly uncaring will only act if forced to do so, and then in the minimal way they can get away with.

In tracking down attackers, the bulldog usually wins, but when push comes to shove, the attackers almost all get away with it because the legal systems of the world do not treat intellectual property in the same way as other sorts of property. In minimizing their effort, the hopelessly uncaring usually win, until someone gets into their systems. But then, they usually don't care even when this happens. They may even hire the attacker to make their systems more secure, a tactic that almost certainly makes their systems less secure, but then they don't care anyway. The self-righteous indignant often impede investigation on purpose. In some cases, they can be convinced by the facts, but usually they simply deny the reality of the attack and assert that it couldn't work because their security system is designed to stop it. The facts rarely sway them in these cases, and the best you can usually do is go over their head (which they will thank you for without end).

How to Improve IR on the Internet

Incident response on the Internet could be dramatically improved by three simple steps:

The first step is to create an easy way to associate a responsible parties with an IP address or machine name. The Internet should support such an activity as a baseline method for addressing network problems of all sorts. This is the function of the programs and techniques listed above, and it has been automated to a certain extent for most cases, but as an infrastructure, the Internet needs to have a stronger and easier to use responsibility mechanism.
The second step is to have some official rules of the road. As far as the Internet is concerned, there are no rules of acceptable behavior, there are no procedures for having a malicious party removed from the Internet, and there is no reliable way to force a perpetrator to cease and desist. Some Internet supporters will assert that the very inability to control the Internet is fundamental to its operation and a key feature of its design. But free speech, even in the most liberal of nations, is limited when it comes to insighting riots and advocating violence.
The third step is to have some legal enforcement mechanism that removes people who display malicious behaviors from the Internet. The jurisdictional issues of the Internet are perhaps the most complex in the world, but if the global information infrastructure is to last, it will either have to move away from the anarchy that currently exists or everyone who uses it will have to learn how to defend themselves with force. In today's Internet, it is illegal for me to return fire when my site is attacked, but law enforcement will not arrest the attackers even when they are clearly identified and admit guilt. Only the rich and powerful can get law enforcement support, and then prosecutorial discretion is used poorly. As a net effect, those who follow the law are punished, and those who break it are rewarded, and those with power, as always, get their way.

Summary

In the Internet today, there are diverse and hard-to-use tools that allow defenders to respond to incidents in a reasonable way. The skilled defender can successfully stave off attacks, and through the use of automation, can make the process reasonably efficient.

Although the attackers would have defenders abandon automation and the old-timers would prefer that all Internet incidents were handled with the personal touch, the new reality of the Internet is that many attackers use automation to launch their attacks and the most successful defenders have developed tools to automate their defenses.

The Internet provides little in the way of efficiency for the defender, and the environment is not conducive to finding and punishing attackers. This is a formula for disaster on a global scale. The changes that may be required in order to deal with this threat are substantial as well, and it will be a daunting task to find a way for human society to deal with the issues of the first really global information infrastructure.

About The Author

Fred Cohen is a Senior Member of Technical Staff at Sandia National Laboratories and a Senior Partner of Fred Cohen and Associates in Livermore California, an executive consulting and education group specializing information protection. He can be reached by sending email to fred at all.net.

@pc31