How does GraphHopper send out traveling salesmen?

The traveling salesman problem is the problem of finding the shortest possible route through a given set of locations whereby each loaction must only be visited once and the traveling saleman must return to its start location. A very good introduction into traveling salesman problems has been done by William Cook. It is fun to read!

One of the properties of the problem that makes it interesting and challenging at the same time is that the solution space, i.e. the number of possible routes, grows very fast with the number of locations to be visited. For example, for a 50 location problem there are 49! possible routes – when setting one location as start and end, and dealing with asymmetric travel distances. 49! results in a number with 63 digits.

Thus, for a certain number of locations visiting the entire solution space is no option anymore. However, mathematicians and operation researchers have developed methods that guarantee the optimal solution without visiting all possible routes, e.g. various branch and cut algorithms. Furthermore, they have developed methods that have been proven to find good solutions very fast, but cannot guarantee a certain quality. Latter are usually called heuristics.

Geoffrey de Smet recently wrote a nice article about “How good are human planners?”. He showed in a non-representative experiment that the ordinary heuristics humans apply to solve these problems are ok, but in comparison to machines they are not that good. Consequently, they can hardly judge the quality of algorithms that usually solve these problems much better than they can.

To judge the quality and speed of algorithms, scientists have developed artificial problems to benchmark them. These benchmarking problems usually come with a simple distance metric such as Euclidean distance, or with precompiled real world travel distances. This is the best way to evaluate the optimization algorithm. However, when solving real world problems, the computation time it takes to calculate travel times and distances based on real networks is relevant too. To put it in other words, point-to-point routing is an elementary part of solving these real world problems. Therefore, to benchmark the capabilities of our Optimization API to solve real world problems, we do two things here.

First, to evaluate the quality of the applied optimization algorithm, we benchmark it against classical traveling salesman benchmarking problems.
Second, we created 8 real world problems differing in the number of locations and measured the speed of solving the entire problem, i.e. not only the speed of solving the underlying optimization problem, but also the speed of assigning geo-codes to network links and solving the required point-to-point routing problems.

Classic Benchmark Problems

Lets start with benchmarking against the classical problems. Here we use a subset of problems from the TSPLib collected by the chair of discrete and combinatorial optimization at the Ruprecht-Karl-Universität Heidelberg. To keep it manageable, we only consider problems with up to 300 locations yielding 47 problems, solved them 3 times with a 2,2 GHz Intel Core i7 MacBook and averaged the results (to reduce random variations). The following table is an aggregated view on the results, i.e. we classified the problems according to their size (<25 locations, 50-100, 100-200 and 200-300) and averaged the computation time as well as the objective value relative to the best known solution.

[table id=1 /]

The table shows that in average our algorithm calculates best known results for problems smaller than 100 locations. For the residual problems it is shown that our results are not worse than 2 percent of the best known solution. Since these are average results, it is fair to also mention the worst result which is not worse than 5 percent relative to the best known result. The table also shows that the average computation time for a problem with a size smaller than 25 locations amounts to 23ms, whereas the computation time for a problem of size 50-100 locations amounts to 357ms.

Real World Problems

Lets conduct the second analysis. Here, we create 8 random real world problems of different size in the city of Berlin. That is we just draw 10, 25, 50, 100, 150, 200, 250 and 300 random geo codes in Berlin, and solved them with our Direction API. Again, we did that 3 times to reduce random effects and the impact of network delays. The following table shows the average computation time and the travel times determined by our Directions API. Note that computation time includes snapping geo-codes to OSM network, calculating n x n travel time matrix, optimizing the traveling salesman problem and network delays.

[table id=2 /]

Thus, it takes in average half a second to solve the entire problem with 25 locations, and 3 seconds to solve a problem with 100 locations. We think that this is rather fast. Thus, it allows us to solve several hundreds of these small to medium size problems on our servers in a minute. To give a visual impression of the solution for the 200-location problem, look at the following figure.

Note that we do not demand our Optimization API to produce optimal and best known results since this would be too expensive in terms of computation time. However, we want it to calculate very good and reasonable results in short time, and we designed our API such that it can deal with various additional side constraints without loosing much of its speed. Therefore, it cannot only be used to solve the classical traveling salesman problem, but also to solve problems where the salesman needs to consider time windows at locations, where he needs to consider his own legal rest periods, where he does not need to come back to its start location and where an arbitrary number of additional relations between locations exists (such as that location B needs to be visited directly after location A etc.).

Just play with our official live examples, sign up and get started!