My Alma Mater, Prague School of Economics, is located in Žižkov. A formerly working class neighborhood, now rather gentrified, it has to this day retained some traces of its former rougher edges. One of these is an active night life.

A crawl through the bars of Žižkov is therefore a familiar activity for many VŠE students, and can serve as a gateway drug for serious optimization techniques. Such as the Travelling Salesman Problem.

The TSP is an optimization classic, with a number of well understood and highly standardized solutions available in the context of statistical programming language R.

In this blog post I would like to share a practical example of solving the TSP using Open Street Map data of bars via {osmdata} and HERE routing engine via {hereR}. The actual solution will be found by utilizing the {TSP} package.

library(sf) # for spatial data handling
library(dplyr) # for general data frame processing
library(osmdata) # to get data in from OSM
library(leaflet) # to show data interactively
library(hereR) # interface to routing engine
library(TSP) # to solve TSP

The first step in our exercise is acquiring data of Žižkov bars. A search is performed over the area of core Žižkov, defined as a polygon, using the OSM Overpass API.

As there seems not to be a clear consensus over what constitutes a bar, restaurant or a pub in Prague I am including all three of the possible amenities.

# bbox = http://bboxfinder.com - "core" Žižkov
core_zizkov <- c(14.437408,50.081131,
                 14.452686,50.087561)

# acquire bar data - https://wiki.openstreetmap.org/wiki/Map_features#Amenity 
search_res <- opq(bbox = core_zizkov) %>%
  add_osm_feature(key = "amenity", 
                  value = c("bar", "restaurant", "pub")) %>%
  osmdata_sf(quiet = T) 

# pulls bars as points
bars <- search_res$osm_points %>%  
  filter(!is.na(name)) %>% 
  select(name)

# show results
leaflet(bars) %>% 
  addProviderTiles("CartoDB.Positron") %>% 
  addCircleMarkers(fillColor = "red",
                   radius = 5,
                   stroke = F,
                   fillOpacity = .75,
                   label = ~ name)


We have located 76 bars, implying a distance matrix of 5776 elements. Not a huge one by today’s standards — but big enough to think twice about trying to solve using a pen and a piece of paper.

I have found that while it is not overly difficult to solve the TSP for all the Žižkov bars there is educational value in running the TSP over only a small sample. I have found it advantageous to be able to actually show the distance matrix — and this page will easily accommodate only about a 5×5 matrix.

# a sample of bars to make the matrix fit a web page
vzorek <- bars %>% 
  slice_sample(n = 5)

# a beer tankard icon for nicer display
beer_icon <- makeAwesomeIcon(
  icon = "beer",
  iconColor = "black",
  markerColor = "blue",
  library = "fa"
)

# a quick overview of our selection
leaflet(vzorek) %>% 
  addProviderTiles("CartoDB.Positron") %>% 
  addAwesomeMarkers(data = vzorek,
                    icon = beer_icon, # the awesome icon declared earlier
                    label = ~name)


The easiest distance matrix to calculate is plain “as the crow flies” distance. This can be calculated via a sf::st_distance() call.

The resulting matrix will be based on pure distance, with some differences in interpretation depending on coordinate reference system of underlying data (Euclidean in projected CRS and spherical in unprojected CRS).

# distance matrix "as the crow flies"
crow_matrix <- st_distance(vzorek,
                           vzorek)

# naming the dimensions for easier orientation
rownames(crow_matrix) <- vzorek$name
colnames(crow_matrix) <- vzorek$name

# a visual check; note that the matrix has a {units} dimension
crow_matrix
## Units: [m]
##                       SOU100 Behind the Courtain Chocobamba  Bar Fud  Olympos
## SOU100                0.0000            136.4923   243.5348 444.1134 465.6544
## Behind the Courtain 136.4923              0.0000   247.9489 578.6872 514.7123
## Chocobamba          243.5348            247.9489     0.0000 590.8310 274.5917
## Bar Fud             444.1134            578.6872   590.8310   0.0000 629.8178
## Olympos             465.6544            514.7123   274.5917 629.8178   0.0000

Calculating the distance matrix using plain distance is easy, and the resulting matrix is symmetrical (distance from A to B equals distance from B to A). It is also hollow (distance from A to A itself is zero).

Solving the TSP for such a matrix is straightforward, as the hard work has been outsourced to the {TSP} package internals.

# solve the TSP via {TSP}
crow_tsp <- crow_matrix %>% 
  units::drop_units() %>%  # get rid of unit dimension
  # declaring the problem as a symmetric TSP
  TSP() %>%
  solve_TSP()

# the tour (crawl) as sequence of bars
vzorek$name[as.numeric(crow_tsp)]
## [1] "Bar Fud"             "Olympos"             "Chocobamba"         
## [4] "Behind the Courtain" "SOU100"

Once we have the optimal route calculated it can be visualized using {leaflet}. The sequence of stops needs to be completed (by repeating the first stop after the last) and cast from points to a linestring.

stops <- as.numeric(crow_tsp) # sequence of "cities" as indices

# bars in sequence, with the first repeated in last place
crow_result <- vzorek[c(stops, stops[1]), ] %>%
  st_combine() %>% # combined to a single object
  st_cast("LINESTRING") # & presented as a route (a line)

# present the as-the-crow-flies based route in crimson color
leaflet(crow_result) %>% 
  addProviderTiles("CartoDB.Positron") %>% 
  addPolylines(color = "crimson",
               popup = "as the crow flies...") %>% 
  addAwesomeMarkers(data = vzorek,
                    icon = beer_icon, # the awesome icon declared earlier
                    label = ~name)


From the visual overview we can see an obvious shortcoming of the “as the crow flies” approach: it completely ignores other constraints except for distance — such as the road network.

Thus while the route shown is “optimal” in the sense that it forms the shortest path joining the five bars selected, it is not one that we could actually follow (unless we were a flying crow).

This shortcoming can be resolved by using an alternative distance matrix as input, while retaining the techniques of {TSP} for the actual route selection. A possible source of more applicable data are routing engines, available to R users via API interfacing packages.

There are several of them available, my personal favorite being the {hereR} package interfacing to the HERE routing engine. I have found HERE to be very reliable and rich in detail. It does require registration, but its free tier is more than adequate for most individual users.

# set the HERE API key; mine is stored in an envir variable
hereR::set_key(Sys.getenv("HERE_API_KEY"))

# a full set of all combinations - 5 × 5 = 25 rows
indices <- expand.grid(from = seq_along(vzorek$name), 
                       to = seq_along(vzorek$name))

# call routing API for all permutations & store for future use
for (i in seq_along(indices$from)) {
  
  active_route <- hereR::route(origin = vzorek[indices$from[i], ],
                               destination = vzorek[indices$to[i], ],
                               transport_mode = "car") %>% 
    # technical columns for easier use and presentation
    mutate(idx_origin = indices$from[i],
           idx_destination = indices$to[i],
           route_name = paste(vzorek$name[indices$from[i]],
                        ">>",
                        vzorek$name[indices$to[i]])) %>% 
    relocate(idx_origin, idx_destination, route_name) %>% 
    st_zm() # drop z dimension, as it messes up with leaflet viz
  
  if (i == 1) {
    # if processing the first sample = initiate a result set
    routes <- active_route 
  } else {
    # not processing the first sample = bind to the existing result set
    routes <- routes %>% 
      bind_rows(active_route)
  }
  
}

# a quick overview of structure of the routes data frame
glimpse(routes)
## Rows: 25
## Columns: 15
## $ idx_origin      <int> 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, …
## $ idx_destination <int> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, …
## $ route_name      <chr> "SOU100 >> SOU100", "Behind the Courtain >> SOU100", "…
## $ id              <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ rank            <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ section         <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ departure       <dttm> 2022-03-28 18:08:37, 2022-03-28 18:08:37, 2022-03-28 …
## $ arrival         <dttm> 2022-03-28 18:08:37, 2022-03-28 18:09:01, 2022-03-28 …
## $ type            <chr> "vehicle", "vehicle", "vehicle", "vehicle", "vehicle",…
## $ mode            <chr> "car", "car", "car", "car", "car", "car", "car", "car"…
## $ distance        <int> 0, 169, 504, 895, 606, 619, 0, 1050, 971, 1199, 1381, …
## $ duration        <int> 0, 24, 105, 139, 113, 98, 0, 177, 145, 189, 236, 182, …
## $ duration_base   <int> 0, 24, 94, 101, 112, 88, 0, 165, 103, 173, 224, 171, 0…
## $ consumption     <dbl> 0.0000, 0.0610, 0.1487, 0.7347, 0.1400, 0.4019, 0.0000…
## $ geometry        <GEOMETRY [°]> POLYGON ((14.44988 50.08544..., LINESTRING (1…

The routing results give us several pieces of data:

  • the routes as linestring objects in EPSG:4326 (for visualization later on)
  • distance of the route (in meters)
  • travel time (in seconds) both raw and adjusted for traffic
  • petrol consumption

To these I have added three technical columns: indices of start & destination for easier joining of solved TSP results back and the name of the route as string for visualization purposes.

Having a variety of metrics will be helpful in construction of alternative distance matrices.

The first routing distance matrix will be based on route distance; notice that while the matrix is hollow it is not symmetrical. This is not surprising, as routing is not commutative — optimal route from A to B need not be the same as from B to A, due to constraints such as one way roads. Žižkov is a veritable warren of one way streets.

# distance matrix based on actual distances
distance_matrix <- matrix(routes$distance,
                          nrow = nrow(vzorek),
                          ncol = nrow(vzorek))

# naming the dimensions for easier orientation
rownames(distance_matrix) <- vzorek$name
colnames(distance_matrix) <- vzorek$name

# a visual check; the units are meters (distance)
distance_matrix
##                     SOU100 Behind the Courtain Chocobamba Bar Fud Olympos
## SOU100                   0                 619       1381    1011    1169
## Behind the Courtain    169                   0       1007     637     795
## Chocobamba             504                1050          0     972     661
## Bar Fud                895                 971       1698       0    1523
## Olympos                606                1199        360    1031       0
# solve the TSP via {TSP}
distance_tsp <- distance_matrix %>% 
  # declaring the problem as asymmetric TSP
  ATSP() %>%
  solve_TSP()

# the tour (crawl) as sequence of bars
vzorek$name[as.numeric(distance_tsp)]
## [1] "Chocobamba"          "SOU100"              "Bar Fud"            
## [4] "Behind the Courtain" "Olympos"

Once we have solved the TSP and figured the sequence of “cities” to visit it is time to report our results.

For this purpose it is advantageous to prepare a data frame of indices of start and destination, and join it back with the original dataset from HERE API (which contains routes as linestrings).

stops <- as.numeric(distance_tsp) # sequence of "cities" as indices

# a route as a set of origin & destination pairs, as indexes,
# destination is offset by one from start (last destination = first start)
distance_route <- data.frame(idx_origin = stops,
                             idx_destination = c(stops[2:(nrow(vzorek))],
                                                 stops[1]))

# amend the origin & destination indexes by actual routes
distance_result <-  distance_route %>% 
  inner_join(routes,
             by = c("idx_origin", "idx_destination")) %>% 
  st_as_sf() 

# present the distance based route in goldenrod color
leaflet(distance_result) %>% 
  addProviderTiles("CartoDB.Positron") %>% 
  addPolylines(color = "GoldenRod",
               popup = ~route_name) %>% 
  addAwesomeMarkers(data = vzorek,
                    icon = beer_icon, # the awesome icon declared earlier
                    label = ~name)


Since the HERE API is generous in terms of results provided it is not difficult to construct an alternative distance matrix, using a different metric. This could be either trip duration or petrol consumption.

In our specific situation both of these can be expected to be be highly correlated with the plain distance results. All the streets in Žižkov are of very similar type, and the average speed & consumption are unlikely to vary greatly between the routes.

The most significant difference between the distance and time based TSP will be driven by current traffic, which is a factor HERE routing engine considers.

# distance matrix based on travel time
duration_matrix <- matrix(routes$duration,
                          nrow = nrow(vzorek),
                          ncol = nrow(vzorek))

# names make the distance matrix easier to interpret
rownames(duration_matrix) <- vzorek$name
colnames(duration_matrix) <- vzorek$name

# a visual check; the units are seconds (time)
duration_matrix
##                     SOU100 Behind the Courtain Chocobamba Bar Fud Olympos
## SOU100                   0                  98        236     183     205
## Behind the Courtain     24                   0        182     120     151
## Chocobamba             105                 177          0     216      99
## Bar Fud                139                 145        268       0     244
## Olympos                113                 189         58     203       0
# solving using the same pattern as distance based TSP
duration_tsp <- duration_matrix %>% 
  ATSP() %>% 
  solve_TSP() 

# the tour (crawl) as sequence of bars
vzorek$name[as.numeric(duration_tsp)]
## [1] "SOU100"              "Behind the Courtain" "Bar Fud"            
## [4] "Olympos"             "Chocobamba"

Once we have solved the trip duration optimized TSP we again need to report the results; in our use case the output is very similar to the distance based one.

This will not necessarily be the case in other contexts, especially ones with greater variation of road types (city streets vs. highways).

# the same steps as for distance based matrix
stops <- as.numeric(duration_tsp)

duration_route <- data.frame(idx_origin = stops,
                             idx_destination = c(stops[2:(nrow(vzorek))],
                                                 stops[1]))
# again, the same as for distance based calculation
duration_result <-  duration_route %>% 
  inner_join(routes,
             by = c("idx_origin", "idx_destination")) %>% 
  st_as_sf() 

# present the duration based route in light blue color
leaflet(duration_result) %>% 
  addProviderTiles("CartoDB.Positron") %>% 
  addPolylines(color = "cornflowerblue",
               popup = ~route_name) %>% 
  addAwesomeMarkers(data = vzorek,
                    icon = beer_icon,
                    label = ~ name)


I believe my tongue in cheek example has succeeded in showing two things:

  • the ease of applying a standardized solution (the {TSP} package) to a well known and well understood problem (the Travelling Salesman Problem) within the context of R ecosystem
  • construction of distance matrices from HERE API routing results, with option to optimize for multiple metrics (such as minimizing the travel distance, travel time and petrol consumption)