Gist: they can spread too fast to deal with unless you throttle all connections. Science, Vol 304, Issue 5670, 527-529 , 23 April 2004 COMPUTER SCIENCE: Technological Networks and the Spread of Computer Viruses Justin Balthrop, Stephanie Forrest, M. E. J. Newman, Matthew M. Williamson* Computer viruses and worms are an increasing problem throughout the world. By some estimates 2003 was the worst year yet: Viruses halted or hindered operations at numerous businesses and other organizations, disrupted cash-dispensing machines, delayed airline flights, and even affected emergency call centers. The Sobig virus alone is said to have caused more than $30 billion in damage (1). And most experts agree that the damage could easily have been much worse. For example, Staniford et al. describe a worm that could infect the entire Internet in about 30 s (2). A worm of this scale and speed could bring the entire network to a halt, or worse. The term virus refers to malicious software that requires help from computer users to spread to other computers. E-mail viruses, for instance, require someone to read an e-mail message or open an attached file in order to spread. The term worm refers to infections that spread without user intervention. Because they spread unaided, worms can often spread much faster than viruses. Computer infections such as viruses and worms spread over networks of contacts between computers, with different types of networks being exploited by different types of infections. The structure of contact networks affects the rate and extent of spreading of computer infections, just as it does for human diseases (3-7); understanding this structure is thus a key element in the control of infection. Both traditional and network-based epidemiological models have been applied to computer contagion (3-5). Recent work has emphasized the effects of a network's degree distribution. A network consists of nodes or vertices connected by lines or edges, and the number of edges connected to a vertex is called its degree. Of particular interest are scale-free networks, in which the degree distribution follows a power law, where the fraction pk of vertices with degree k falls off with increasing k as k- for some constant . This structure has been reported for several technological networks, including the Internet (8) and the World Wide Web (9, 10). Infections spreading over scale-free networks are highly resilient to control strategies based on randomly vaccinating or otherwise disabling vertices. This is bad news for traditional computer virus prevention efforts, which use roughly this strategy. On the other hand, targeted vaccination, in which one immunizes the highest degree vertices, can be very effective (11, 12). These results rely crucially on the assumption that the degree distribution follows a power law, and also that the contact pattern is static. Many technological networks relevant to the spread of viruses, however, are not scale-free. Vaccination strategies focusing on highly connected network nodes are unlikely to be effective in such cases. Furthermore, network topology is not necessarily constant. In many cases the topology depends on the replication mechanism used by a virus and can be manipulated by virus writers to circumvent particular control strategies that we attempt. If, for instance, targeted vaccination strategies were found to be effective against viruses spreading over scale-free networks, viruses might be rewritten so as to change the structure of the network to some non-scale-free form instead. To make these ideas more concrete, we consider four illustrative networks, each of which is vulnerable to attack: (A) the network of possible connections between computers using the Internet Protocol (IP), (B) a network of shared administrator accounts for desktop computers, (C) a network of e-mail address books, and (D) a network of e-mail messages passed between users. In network A, each computer has a 32-bit IP address and there is a routing infrastructure that supports communication between any two addresses. We consider the network in which the nodes are IP addresses and two nodes are connected if communication is possible between the corresponding computers. Many epidemics spread over such an IP network. Notable examples include the Nimda and SQLSlammer worms. Network B is a product of the common operating system feature that allows computer system administrators to read and write data on the disks of networked machines. Some worms, including Nimda and Bugbear, can spread by copying themselves from disk to disk over this network. Network C has nodes representing users and a connection from user i to user j if j's e-mail address appears in i's address book. Many e-mail viruses use address books to spread (for example, ILoveYou). A closely related network is network D, in which the nodes represent computer users, and two users are connected if they have recently exchanged e-mail. Viruses such as Klez spread over this network. Degree distributions have been measured for examples of each of the four networks (see the figure). In network A, all vertices have the same degree, so the distribution has a single peak at this value (blue histogram). In network B, the distribution consists of four discrete peaks, presumably corresponding to different classes of computers, administrators, or administration strategies (red histogram). Infectious connections. (Left) Degree distributions for the IP (blue) and administrator (red) networks. The administrator network data were collected on a system of 518 users of 382 machines at a large corporation. Computers in this network are connected if any user has administrative privileges on both computers. (Right) Cumulative degree distributions for the e-mail address book (blue) and e-mail traffic (red) networks. The e-mail address book data were collected from a large university (7). The e-mail traffic data were collected for complete e-mail activity of a large corporate department over a 4-month period (18). The two e-mail networks have more continuous distributions and are shown as cumulative histograms. Although neither network has a power-law degree distribution, both have moderately long tails, which suggests that targeted vaccination strategies might be effective. However, calculations show that for network C, about 10% of the highest degree nodes would need to be vaccinated to prevent an epidemic from spreading (7), whereas network D would require about 80%. The first of these figures is probably too high for an effective targeted vaccination strategy, and the second is clearly far too high. (Targeted vaccination would be entirely ineffective in the other two networks as well, because the nodes are much more highly connected.) The two e-mail networks illustrate the ways in which different virus replication strategies can lead to different network topologies. An e-mail virus could look for addresses in address books, thereby spreading over a network with a topology like that of network C, or it could search through other files or folders on the machine for addresses of senders and recipients of archived e-mail messages, giving a topology more like network D. Another example is provided by the Nimda virus, which infects Web servers by targeting random IP addresses, producing a network like network A. However, if the virus had a more intelligent way of selecting IP addresses to attack (e.g., by inspecting hyperlinks), then it might spread over a topology more like that of the Web, which is believed to have a power-law degree distribution (9, 10). A control strategy is needed that is immune to changes in network topology and that does not require us to know the mechanisms of infections before an outbreak. A number of methods have been proposed (13). One such strategy is throttling, first introduced for the control of misbehaving programs (14) and recently extended to computer network connections (15). In this context, throttling limits the number of new connections a computer can make to other machines in a given time period. Because it works by limiting spreading rates rather than stopping spread altogether, the method does not completely eliminate infections but only slows them down. Frequently, however, this is all that is necessary to render a virus harmless or easily controllable by other means (16). Throttling is most effective when viruses generate traffic at a rate significantly higher than normal network communications. Luckily this is true for many common protocols and the viruses that exploit them (15, 17). For a virus to spread, it needs to propagate itself to many different machines; to spread quickly, it must do so at a high rate. For example, the Nimda worm attempts to infect Web servers at a rate of around 400 new machines per second, which greatly exceeds the normal rate of connections to new Web servers of about one per second or slower (15). A throttling mechanism that limited connections to new Web servers to about one per second would slow Nimda by a factor of 400 without affecting typical legitimate traffic. This could easily be enough to change a serious infection into a minor annoyance, which could then be eliminated by traditional means. Slowing the spread of Nimda by a factor of 400 (from a day to more than a year) would have allowed plenty of time to develop and deploy signatures and prophylactic software patches. (Of course, if throttling were implemented on only a subset of the nodes in a network, then infections could spread more easily.) In addition to reducing virus spread, throttling has the practical benefit of reducing the amount of traffic generated by an epidemic, thus reducing demand on networking equipment--often the primary symptom of an attack. Targeted vaccination strategies for the control of computer viruses are unlikely to be generally effective because the networks over which viruses spread are not sufficiently dominated by highly connected nodes, and because network topology can be influenced strongly by the way in which a virus is written. Throttling provides a promising alternative strategy that works with any network topology and can greatly reduce viruses' impact by slowing their spread to the point where they can be treated by conventional means. The disparity between the speed of computer attacks (machine and network speed) and the speed of manual response (human speed) has increased in recent years. If this trend continues, automated mechanisms like throttling will likely become an essential tool, complementing the largely manual approach of software patching in use today. The idea of rate limits is not specific to viruses, and could be applied to many situations in which an attack or cascading failure occurs faster than possible human response. References and Notes 1.Citations documenting these events are listed on www.cs.unm.edu/~judd/virus.html, as are citations to each of the viruses and worms mentioned above. 2.S. Staniford, V. Paxson, N. Weaver, in Proceedings of the USENIX Security Symposium (USENIX Association, Berkeley, CA, 2002), pp. 149-167. 3.J. O. Kephart, S. R. White, in Proceedings of the 1991 IEEE Computer Society Symposium on Research in Security and Privacy (IEEE Computer Society, Los Alamitos, CA, 1991), pp. 343-359. 4.R. Pastor-Satorras, A. Vespignani, Phys. Rev. Lett. 86, 3200 (2001) [APS]. 5.A. L. Lloyd, R. M. May, Science 292, 1316 (2001). 6.H. Ebel, L.-I. Mielsch, S. Bornholdt, Phys. Rev. E 66, 035103 (2002) [APS]. 7.M. E. J. Newman, S. Forrest, J. Balthrop, Phys. Rev. E 66, 035101 (2002) [APS]. 8.M. Faloutsos, P. Faloutsos, C. Faloutsos, Comput. Commun. Rev. 29, 251 (1999). 9.R. Albert, H. Jeong, A.-L. Barabasi, Nature 401, 130 (1999) [Abstract]. 10.J. M. Kleinberg, S. R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, in Proceedings of the International Conference on Combinatorics and Computing, vol. 1627 of Lecture Notes in Computer Science (Springer, Berlin, 1999), pp. 1-18. 11.R. Albert, H. Jeong, A.-L. Barabasi, Nature 406, 378 (2000) [Medline]. 12.D. S. Callaway, M. E. J. Newman, S. H. Strogatz, D. J. Watts, Phys. Rev. Lett. 85, 5468 (2000) [APS]. 13.D. Moore, C. Shannon, G. Voelker, S. Savage, in Proceedings of the 22nd Annual Joint Conference of IEEE Computer and Communication Societies (INFOCOM) (IEEE Communications Society, New York, 2003), pp. 1901-1910. 14.A. Somayaji, S. Forrest, in Proceedings of the 9th USENIX Security Symposium (USENIX Association, Berkeley, CA, 2000), pp. 185-197. 15.M. M. Williamson, in Proceedings of ACSAC Security Conference (IEEE Computer Society, Los Alamitos, CA, 2002), pp. 61-68. 16.M. M. Williamson, Complexity 9, 34 (November- December 2003) [Abstract]. 17.M. M. Williamson, in Proceedings of ACSAC Security Conference (IEEE Computer Society, Los Alamitos, CA, 2003), pp. 76-85. 18.J. R. Tyler, D. M. Wilkinson, B. A. Huberman, in Communities and Technologies, M. Huysman, E. Wenger, V. Wulf, Eds. (Kluwer, Dordrecht, Netherlands, 2003), pp. 81-95 [publisher's information]. 19.We thank J. Gassaway for help collecting the e-mail address book data set, C. Hickman for the administrator data set, and J. Tyler and B. Huberman for the e-mail traffic data set. Supported by the James S. McDonnell Foundation, NSF, Defense Advanced Research Projects Agency, Intel Corp., and Santa Fe Institute.