A question was raised recently on the RStudio discussion forum about an algorithm for linking spatial points by a network of lines.

The lines from points use case is frequently utilized in both visualizing of regional development – the lines representing flow from one region to another via color and / or thickness – and in network analysis – measuring & visualizing distance (i.e. length of the line) – between two or more areas of interest.

As I was unable to refer the poster of the question to a suitable published walkthrough I propose one of my own.

It is based on a function (I might be able to extend it into a package when time allows) points_to_lines(). The function takes four arguments, three of which are mandatory:

  • data frame of spatial points, expected in {sf} package format; it is placed as the first argument, so the function is pipe friendly

  • name of column containing technical IDs of points (typically FIPS codes in the US, NUTS in the EU, or some other ID)

  • name of column containing names of the points for labels

  • indication whether order of the points matters (meaning whether line from A to B is equivalent to line from B to A); default is TRUE

The function returns a spatial data frame of four columns: ID of starting point, ID of ending point, label (names of the two points, separated by a dash) and a geometry column of type LINESTRING; the geometry will be in the same CRS as original points.

library(sf)
library(dplyr)

points_to_lines <- function(data, ids, names, order_matters = TRUE) {
  
  # dataframe of combinations - based on row index
  idx <- expand.grid(start = seq(1, nrow(data), 1),
                     end = seq(1, nrow(data), 1)) %>%
    # no line with start & end being the same point
    dplyr::filter(start != end) %>%  
    # when order doesn't matter just one direction is enough
    dplyr::filter(order_matters | start > end) 
  
  
  # cycle over the combinations
  for (i in seq_along(idx$start)) {
    
    # line object from two points
    wrk_line  <- data[c(idx$start[i], idx$end[i]), ] %>% 
      st_coordinates() %>% 
      st_linestring() %>% 
      st_sfc()
    
    # a single row of results dataframe
    line_data <- data.frame(
      start = pull(data, ids)[idx$start[i]],
      end = pull(data, ids)[idx$end[i]],
      label = paste(pull(data, names)[idx$start[i]], 
                    "-", 
                    pull(data, names)[idx$end[i]]),
      geometry = wrk_line
    )
    
    # bind results rows to a single object
    if (i == 1) {
      res <- line_data
      
    } else {
      res <- dplyr::bind_rows(res, line_data)
      
    } # /if - saving results
    
  } # /for
  
  # finalize function result
  res <- sf::st_as_sf(res, crs = sf::st_crs(data))
  
  res
  
} # /function

The function can be easily sourced & then used as a one liner in any script; it requires only {sf} and {dplyr} packages, so no cruel or unusual dependencies are involved.

The intended use case is to generate a spatial data frame of lines from a spatial data frame of points (either centroids or points-on-a-surfaces or what not) and the result then joined with actual data via one of the dplyr::*_join() functions.

To demonstrate the use of the function I am showing links between five semi random counties in North Carolina (using the popular nc.shp shapefile that ships with the {sf} package, and is therefore widely available).

# Well known & much loved shapefile of NC included with sf package
nc_polygons <- sf::st_read(system.file("shape/nc.shp", package="sf"), quiet = T)

set.seed(16)

# five semi random county centroids
nc_points <- nc_polygons %>% 
  sf::st_centroid() %>% 
  slice(sample(1:nrow(nc_polygons), 5))
  
# function of lines from points
nc_lines <- points_to_lines(nc_points, ids = "FIPS", names = "NAME")

# a graphic overview
library(ggplot2)
ggplot() +
  geom_sf(data = nc_polygons, color = "gray45", fill = NA) +
  geom_sf(data = nc_lines, color = "red") 

The algorithm to create lines comes in two flavors, depending on whether order matters for your use case.

In case order does matter – i.e. a line from Greene to Pender counties is different from the one from Pender to Greene – there will be nrow(data) × (nrow(data) - 1) lines (each point is connected to every other point except itself).

In case order does not matter – i.e. once a line is drawn from Greene to Pender there will be no need to plot another in opposite direction – there will be only half as much lines required.

To pick which behavior is desirable change the value of order_matters argument; the default is TRUE, meaning yes, order does matter.

# when order matters >> both directions are required >> 20 rows
points_to_lines(nc_points, ids = "FIPS", names = "NAME", order_matters = T) %>% 
  knitr::kable()
start end label geometry
37079 37141 Greene - Pender LINESTRING (-77.67889 35.48…
37161 37141 Rutherford - Pender LINESTRING (-81.91787 35.39…
37181 37141 Vance - Pender LINESTRING (-78.41127 36.36…
37023 37141 Burke - Pender LINESTRING (-81.70216 35.74…
37141 37079 Pender - Greene LINESTRING (-77.91628 34.52…
37161 37079 Rutherford - Greene LINESTRING (-81.91787 35.39…
37181 37079 Vance - Greene LINESTRING (-78.41127 36.36…
37023 37079 Burke - Greene LINESTRING (-81.70216 35.74…
37141 37161 Pender - Rutherford LINESTRING (-77.91628 34.52…
37079 37161 Greene - Rutherford LINESTRING (-77.67889 35.48…
37181 37161 Vance - Rutherford LINESTRING (-78.41127 36.36…
37023 37161 Burke - Rutherford LINESTRING (-81.70216 35.74…
37141 37181 Pender - Vance LINESTRING (-77.91628 34.52…
37079 37181 Greene - Vance LINESTRING (-77.67889 35.48…
37161 37181 Rutherford - Vance LINESTRING (-81.91787 35.39…
37023 37181 Burke - Vance LINESTRING (-81.70216 35.74…
37141 37023 Pender - Burke LINESTRING (-77.91628 34.52…
37079 37023 Greene - Burke LINESTRING (-77.67889 35.48…
37161 37023 Rutherford - Burke LINESTRING (-81.91787 35.39…
37181 37023 Vance - Burke LINESTRING (-78.41127 36.36…
# if directions don't matter >> a single direction is enough >> 10 rows only
points_to_lines(nc_points, ids = "FIPS", names = "NAME", order_matters = F) %>% 
  knitr::kable()
start end label geometry
37079 37141 Greene - Pender LINESTRING (-77.67889 35.48…
37161 37141 Rutherford - Pender LINESTRING (-81.91787 35.39…
37181 37141 Vance - Pender LINESTRING (-78.41127 36.36…
37023 37141 Burke - Pender LINESTRING (-81.70216 35.74…
37161 37079 Rutherford - Greene LINESTRING (-81.91787 35.39…
37181 37079 Vance - Greene LINESTRING (-78.41127 36.36…
37023 37079 Burke - Greene LINESTRING (-81.70216 35.74…
37181 37161 Vance - Rutherford LINESTRING (-78.41127 36.36…
37023 37161 Burke - Rutherford LINESTRING (-81.70216 35.74…
37023 37181 Burke - Vance LINESTRING (-81.70216 35.74…