Wednesday, December 12, 2007

Within 50 km, revisited

Commenting on the "Within 50 km" entry blog, a friend of mine remarks that the commitment by the Spanish government concerning the coverage of the high-speed railroad network has been probably misreported by the media, and it might not apply to every town in the country, but only to most important cities. So, let us restate such commitment in a plausible way:

Spanish high-speed railroad network will be developed so that every continental province capital will lie within 50 km of a station.

(Please note that I have no source confirming/denying whether this is really the government's statement.) This is a rather weaker commitment, yet the question remains open whether it can be achieved with a network whose total length will be 2,230 km by 2010, according to the government itself.

Let us consider the 47 continental Spanish province capitals and investigate the minimum length of a network spanning the country with the property that each of the 47 cities lies within 50 km of a network node (station) --we will call this the coverage property. It is easy to see that the network has to be a Steiner tree. We will mathematically prove in a separate entry that the solutions to the problem are in fact Steiner trees constructed out of 47 base points, each lying on the circle with radius 50 km centered at the i-th city. A basic theorem about Steiner trees tells us that there can be no more than 47−2 additional Steiner points arising on the construction of the tree; this allows us to regard the solution network as a minimum spanning tree on 47+45=92 nodes. If there happens to be a solution with less than 92 nodes, we still have one with exactly 92 nodes: just add the remaining points arbitrarily along the original network. Summing up, we can formulate our problem as that of finding meshes of n=92 nodes with the coverage property whose minimum spanning tree length is minimum.

I have implemented a genetic algorithm to solve the problem, along the following lines:

  • Each candidate solution (or individual) of the pool is merely a set of n points. The fitness of an individual is the complementary of its "length", which is calculated as the length of its associated minimum spanning tree, with an additional penalty for each uncovered city (i.e. each city lying more than 50 km away from every point of the solution).
  • Initialization of each individual is implemented with a mix of "base points" lying at 50 km from some city and "free points" resulting from the weighted sum of two randomly selected cities. This mixed strategy ensures that the initial population has a decent proportion of individuals satisfying the coverage property.
  • On each generation, the 25% fittest individuals are selected for survival and breeding.
  • Crossover of two individuals a and b is implemented as follows: A city c is selected at random and the points of a and b are sorted according to their distance to c. The child individual is built with the first m points of a and the nm last points of b, for some random m between 0 and n. The points of the child individual can also be randomly mutated: a mutated point results from the weighted sum of the original and some randomly selected city.

I wrote a C++ program implementing this algorithm (you will need Boost to compile it): the program keeps iterating the population until aborted by the user, and on each iteration it outputs the minimum network length achieved so far and a SVG picture of the best candidate solution. Coordinates of province capitals have been obtained from the Spanish Ministry of the Interior, and translated to UTM (within the same UTM zone even if this incurs some curvature error) with Gabriel Ortiz's coordinate converter. I ran the program for hours, playing a bit with the initial parameters, and each run stabilized at a minimum length above 2,500 km. The figure below shows the best solution I got, with 2,535 km in length, obtained with a population size of 5,000 and a mutation rate of 0.01 after 4,000 iterations. You can observe a characteristic feature of Steiner trees by which confluent edges never form an angle smaller than 120º.

Take into account that in our rendering of the problem we have made some simplifying assumptions that in general underestimate the network length needed to satisfy the coverage commitment in the real world. For instance:

  • Railroad segments between nodes are always assumed to be straight lines.
  • We have laid out the network from scratch, without considering that part of it (Madrid-Sevilla, Madrid-Barcelona, Madrid-Lisbon, etc.) is already in place or planned, which, by definition, increases the length of the optimum compatible solution we can build from there.

So, unless we have made some mistake in our analysis, it is impossible to extend the high-speed railroad network to within 50 km of every province capital with only 2,230 km as planned by 2010.

No comments:

Post a Comment