F. P. Kelly
Statistical Laboratory, University of Cambridge
16 Mill Lane, Cambridge CB2 1SB, UK
How should flows through a network be organized, so that the network responds sensibly to failures and overloads? The question is currently of considerable technological importance in connection with the development of computer and telecommunication networks, while in various other forms it has a long history in the fields of physics and economics. In all of these areas there is interest in how simple, local rules, often involving random actions, can produce coherent and purposeful behaviour at the macroscopic level. This talk describes some examples from these various fields, and indicates how analogies with fundamental concepts such as energy and price can provide powerful insights into the design of routing schemes for communication networks.
Modern computer and telecommunication networks are able to respond to randomly fluctuating demands and failures by rerouting traffic and by reallocating resources. They are able to do this so well that, in many respects, large scale networks appear as coherent, almost intelligent, organisms. The design and control of such networks require an understanding of a variety of fundamental issues and this is providing an important stimulus to many areas of mathematics and engineering.
To give one example of current importance, a major practical and theoretical issue concerns the extent to which control can be decentralized. Over a period of time the form of the network or the demands placed on it may change and routings may need to respond accordingly. It is rarely the case, however, that there should be a central decisionmaking processor, deciding upon these responses. Such a centralized processor, even if it were itself completely reliable and could cope with the complexity of the computational task involved, would have its lines of communication through the network vulnerable to delays and failures. Rather, decisionmaking should be decentralized and of a simple form: the challenge is to understand how such decentralized decisionmaking can be organized so that the network as a whole reacts intelligently to outside stimuli.
The behaviour of large scale systems has been of great interest to mathematicians for over a century, with many examples coming from physics. The behaviour of a gas can be described at the microscopic level in terms of the position and velocity of each molecule. At this level of detail a molecule's velocity appears as a random process, with a stationary distribution found by Maxwell (and later discussed by Erlang [8]). Consistent with this detailed microscopic description of the system is macroscopic behaviour best described by quantities such as temperature and pressure. Similarly the behaviour of electrons in an electrical network can be described in terms of random walks, and yet this simple description at the microscopic level leads to rather sophisticated behaviour at the macroscopic level: the pattern of potentials in a network of resistors is just such that it minimizes heat dissipation for a given level of current flow [33]. The local, random behaviour of the electrons causes the network as a whole to solve a rather complex optimization problem.
Of course simple local rules may lead to poor system behaviour if the rules are the wrong ones. Road traffic networks provide a chastening example of this. Braess' paradox describes how if a new road is added to a congested network, the average speed of traffic may fall rather than rise, and indeed everyone's journey time may lengthen. The paradox may actually have occurred during `development' in the centre of Stuttgart [23], and counterintuitive consequences of road closures are often reported [27]. The attempts of individuals to do the best for themselves lead to everyone suffering. It is possible to alter the local rules, by the imposition of appropriate tolls, so that the network behaves more sensibly, and indeed road traffic networks have long provided a key example of the economic principle that externalities need to be appropriately penalized if the invisible hand is to lead to optimal behaviour ([28], [35]).
A telephone network provides a fascinating example of a large scale system where strange effects can occur. For instance, suppose that `intelligent' exchanges react to blocked routes by rerouting calls along more resourceintensive paths. This in turn may cause later calls to be rerouted, and the cascade effect may lead to a catastrophic change in the network's behaviour. When exchanges strive to be efficient there is a possibility they may overdo it. In some respects the network's behaviour resembles water boiling. Just as a small change in the temperature of a body of water can cause a pronounced macroscopic effect, so a small change in the load on a network can produce an unexpected and massive failure. This discussion indicates the care that must be taken with the development of routing rules for large networks.
There is currently considerable interest in the similarities between complex systems from diverse areas of physics, economics and biology, and it is clear that the study of topics such as noisy optimization and adaptive learning provide mathematical metaphors of value across many fields. The reader is referred to [3], [24] and [29] for the lively and thoughtprovoking proceedings of a series of workshops, and to [38] for a study of statistical equilibrium in systems of interacting components that is both broad and penetrating. Our aim in this talk is, by comparison, much narrower: to describe within a common framework some examples from three areas, and thus to indicate how earlier, often much earlier, work on electrical networks and traffic flow has influenced recent work on routing schemes for communication networks. The talk is based on a longer paper which has appeared as [19].
It is possible to develop rather quickly the connections between random walks, electrical networks and variational principles. We describe how an interacting particle system, precisely defined at a microscopic level in terms of simple, randomized, local rules, can also be described at an intermediate level of aggregation in terms of Ohm's Law and Kirchhoff's equations, and at a macroscopic level in terms of variational principles. Thus Thompson's principle states that the flow pattern of current within an electrical network is that which minimizes energy dissipation over all flow patterns achieving the same total current flow.
The interacting particle system we describe is due to Kingman [22], and our treatment of the associated optimization problem derives from Whittle [37]. For a beautifully written account of the area, and of its history back to Lords Rayleigh and Kelvin, the reader is referred to Doyle and Snell [7].
Thompson's principle is an extremely useful and suggestive result. It follows directly that if a link is added to an electrical network and the same total current is carried then the energy dissipation is reduced: after all, the old flow pattern remains feasible for the new optimization problem. Might it be possible to design telecommunication networks similarly, so that, for example, additions of capacity are necessarily helpful? Before considering this question in detail, we look first at queueing networks, and the ways in which implicit optimization can go wrong.
Next we consider multiclass flow models, including queueing and road traffic networks. If customers or drivers can choose their routes, then exercise of this choice may drive the system to a competitive equilibrium. If drivers attempt to minimize their own delay the resulting equilibrium flows will minimize a certain ob jective function defined for the network. However the ob jective function is certainly not the average network delay: this is dramatically illustrated by Braess' paradox [4], outlined above. We describe a variant of this paradox: if drivers are provided with extra information about random delays ahead, the outcome may well be a new equilibrium in which delays are increased for everyone.
Traffic dependent tolls are sufficient to force the system to an equilibrium which minimizes average network delay: the tolls charge drivers for the delays they cause to others. The study of appropriate tolls has long been a topic of central importance in economics, and it is interesting to note that Pigou ([28], p.194) used a simple twonode twolink traffic network to illustrate that taxation could `create an "artificial" situation superior to the "natural" one'. (We note in passing that developments in electronics now make feasible the practical application of both route guidance and road pricing: [34].) Potts and Oliver [30] survey the characterization of equilibrium flow patterns as extremal values, and Nagurney [26] provides a recent review of competitive equilibrium problems including general traffic network models and related models of economic markets.
In the remainder of the talk we outline more recent work on the modelling of telecommunications networks. We show how the microscopic description of a telephone network, in terms of random arrival streams and rules for accepting and routing calls, leads the network to behave as if it is attempting to minimize a certain potential function. However, just as in the road traffic network example considered earlier, the potential function implicitly minimized bears little relation to the network performance criteria of interest to system designers. Indeed Wroe et al. [39] have described an example very similar to Braess' paradox, that has arisen in the design of the BT international access network, where the addition of capacity to a network causes the performance to become worse.
Various other forms of perverse behaviour can be interpreted in terms of an implicit minimization: for example, in situations where alternative routing causes the potential function to have multiple minima, a slight change in traffic loads may cause the network to jump catastrophically between minima ([1], [2], [11]).
We next discuss how explicit consideration of such criteria can lead, through notions of shadow prices and implied costs, to decentralized adaptive routing schemes which are at least attempting to optimize the right function.
In a loss network an increase in the offered traffic along a particular route will increase the blocking at links along that route; this in turn will affect the traffic carried along other routes through these links, and knockon effects may propagate throughout the network. As in the case of a queueing network, it is possible to define a shadow price for a link, but this now depends not just upon the traffic statistics for the link but also upon the shadow prices of adjacent other links within the network. These shadow prices can be recast as an implied cost for a link, measuring the expected net effect upon the performance of the network of using one circuit from that link, and a surplus value for a route, measuring the expected net effect upon the performance of the network of accepting a call on that route. By averaging shadow prices over time or over differing traffic conditions, we can assess the effect on the network of adding additional capacity to the various links of the network. Or we can use a real time distributed computation to estimate implied costs and surplus values, and to drive a decentralized routing algorithm.
Important stability questions arise: the effect of a local disturbance to capacity or routing may diminish rapidly with distance from the point of disturbance, or it may not. This indicates the importance of treating a network as a whole: the apparent local benefits of a capacity or routing change may be completely negated by adverse consequences elsewhere in the network. The results described are taken from [16], [18].
Might it be possible to choose the microscopic rules governing the behaviour of a telephone network, the rules for accepting and routing calls, so that the network is implicitly optimizing sensible performance criteria, much as current flow in an electrical network is implicitly minimizing heat dissipation? This is a difficult and wideranging question, but at least in some circumstances it can be answered positively. We describe a scheme, Dynamic Alternative Routing, which has been developed for the British telephone network by researchers at Cambridge and at British Telecom's laboratories at Martlesham. The scheme, known as Dynamic Alternative Routing (DAR), is now being implemented in the UK trunk network ([10], [11], [21], [32]). The scheme will lessen the impact of forecasting errors, make better use of spare capacity and respond robustly to failures and overloads. It will also permit flexible use of network resources enabling, for example, the rapid introduction of new services where demand is often uncertain. DAR uses very simple rules across the network, making constructive use of inherent randomness to search out good routing patterns. In this way the network itself operates as a distributed computer, executing a highly parallel randomized algorithm to solve a complex optimization problem.
The routing scheme described in Section 5 has the advantage that it makes no special assumptions about the underlying topology of the network, but it is adaptive rather than dynamic. DAR, in comparison, requires some regularity of network structure, but operates on a fast time scale, driven by call arrivals, and uses very simple rules. Because of the different time scales involved it is quite possible that the benefits of both schemes may be achievable: certainly Key [20] has described an approach to the calculation of implied costs and shadow prices for a fully connected network using DAR, and has shown how the shadow prices can be used interactively to allocate capacity in a network. Current research is developing methods which use sticky random routing on a fast time scale, driven by call arriv als or failure events, with route sets, priorities and trunk reservation selected using implied costs calculated over the longer time scales necessary for averages to be estimated.
[1] Ackerley, R.G. Hysteresistype behaviour in networks with extensive overflow. British Telecom Technol. J. 5 (1987), 42-50.
[2] Akinpelu, J.M. The overload performance of engineered networks with nonhierarchical routing. AT & T Tech. J. 63 (1984), 1261-1281.
[3] Anderson, P.W., Arrow, K.J. & Pines, D. (eds). The economy as an evolving complex system. Redwood City, California: Addison-Wesley (1988).
[4] Braess, D. Über ein Paradoxon aus der Verkehrsplanung. Unternehmenforschung 12 (1968), 258-268.
[5] Brockmeyer, E., Halstrom, H.L. & Jensen, A. The life and works of A.K. Erlang. Copenhagen: Academy of Technical Sciences (1948).
[6] Cohen, J.E. & Kelly, F.P. A paradox of congestion in a queueing network. J. Appl. Prob. 27 (1990), 732-736.
[7] Doyle, P.G. & Snell, J.L. Random walks and electric networks. Mathematical Association of America (1984).
[8] Erlang, A.K. A proof of Maxwell's law, the principal proposition in the kinetic theory of gases (1925). In [5], pp.222-226.
[9] Gallager, R.G. A minimum delay routing algorithm using distributed computation. IEEE Trans. Comm. 25 (1977), 73-85.
[10] Gibbens, R.J., Kelly, F.P. & Key, P.B. Dynamic Alternative Routing - modelling and behaviour. Proceedings 12th International Teletraffic Congress, Turin (ed. M. Bonatti). Amsterdam: Elsevier (1988).
[11] Gibbens, R.J., Hunt, P.J. & Kelly, F.P. Bistability in communication networks. In Disorder in physical systems (eds. G.R. Grimmett & D.J.A. Welsh), pp.113-128. Oxford: Oxford University Press (1990).
[12] Gibbens, R.J. & Kelly, F.P. Dynamic routing in fully connected networks. IMA J. Math. Cont. Information. 7 (1990), 77-111.
[13] Hajek, B. Average case analysis of greedy algorithms for Kelly's triangle problem and the independent set problem. In 26th IEEE Conference on Decision and Control. New York: IEEE (1987).
[14] Kelly, F.P. Reversibility and stochastic networks. Chichester: Wiley (1979).
[15] Kelly, F.P. Blocking probabilities in large circuitswitched networks. Adv. Appl. Prob. 18 (1986), 473-505.
[16] Kelly, F.P. Routing in circuitswitched networks: optimization, shadow prices and decentralization. Adv. Appl. Probab. 20 (1988), 112-144.
[17] Kelly, F.P. Routing and capacity allocation in networks with trunk reservation. Math. Oper. Res. 15 (1990), 771-793.
[18] Kelly, F.P. Loss networks. Ann. Appl. Probab. 1 (1991), 319-378.
[19] Kelly, F.P. Network routing. Phil. Trans. Roy. Soc. Ser. A337 (1991), 343-367.
[20] Key, P.B. Implied cost methodology and software tools for a fully connected network with DAR and trunk reservation. British Telecom Technol. J. 6 (1988), 52-65.
[21] Key, P.B. & Whitehead, M.J. Costeffective use of networks employing Dynamic Alternative Routing. In Proc. 12th Internat. Teletraffic Congress, Turin (ed. M. Bonatti). Amsterdam: Elsevier (1988).
[22] Kingman, J.F.C. Markov population processes. J. Appl. Probab. 6 (1969), 1-18.
[23] Knödel, W. Graphentheoretische Methoden und ihre Anwendungen. pp. 56-59. Berlin: SpringerVerlag (1969).
[24] Langton, C.G. (ed.) Artificial life. Redwood City, California: Addison-Wesley (1989).
[25] Mitra, D. & Seery, J.B. Comparative evaluations of randomized and dynamic routing strategies for circuitswitched networks. IEEE Trans. Comm. 39 (1991), 102-116.
[26] Nagurney, A. Competitive equilibrium problems, variational inequalities and regional science. J. Regional Science 27 (1987), 503-517.
[27] New York Times. What if they closed 42nd Street and nobody noticed? NYT 25 December 1990, p.38.
[28] Pigou, A.C. The economics of welfare. London: Macmillan (1920).
[29] Pines, D. (ed) Emerging syntheses in science. Redwood City, California: Addison-Wesley (1987).
[30] Potts, R.B. & Oliver, R.M. Flows in transportation networks. New York: Academic Press (1972).
[31] Sexton, M. & Kelly, F.P. Second generation service protection networks using SDH. CCITT SGXVIII (1990).
[32] Stacey, R.R. & Songhurst, D.J. Dynamic Alternative Routing in the British Telecom trunk network. International Switching Symposium, Phoenix, Arizona (1987).
[33] Thomson, W. & Tait, P.G. Treatise on Natural Philosophy. Cambridge (1879).
[34] van Vuren, T. & Smart, M.B. Route guidance and road pricing - problems, practicalities and possibilities. Transport Reviews 10 (1990), 269-283.
[35] Walters, A.A. The theory and measurement of private and social cost of highway congestion. Econometrica 29 (1961), 676-699.
[36] Whitt, W. Blocking when service is required from several facilities simultaneously. AT & T Tech. J. 64 (1985), 1807-1856.
[37] Whittle, P. Optimization under constraints. London: Wiley (1971).
[38] Whittle, P. Systems in stochastic equilibrium. Chichester: Wiley (1986).
[39] Wroe, G.A., Cope, G.A. & Whitehead, M.J. Flexible routing in the BT international access network. 7th UK Teletraffic Symposium. Durham: IEE (1990).
The full version of this talk appeared as: `Network routing', Philosophical Transactions of the Royal Society A337, 343-367 (1991). ( I am grateful to Mark Wainwright for turning this into html - see Mark's A small road network.)