A question was raised recently on the RStudio discussion forum about an algorithm for linking spatial points by a network of lines.
The lines from points use case is frequently utilized in both visualizing of regional development – the lines representing flow from one region to another via color and / or thickness – and in network analysis – measuring & visualizing distance (i.e. length of the line) – between two or more areas of interest.
As I was unable to refer the poster of the question to a suitable published walkthrough I propose one of my own.
It is based on a function (I might be able to extend it into a package when time allows) points_to_lines()
. The function takes four arguments, three of which are mandatory:
data frame of spatial points, expected in
{sf}
package format; it is placed as the first argument, so the function is pipe friendlyname of column containing technical IDs of points (typically FIPS codes in the US, NUTS in the EU, or some other ID)
name of column containing names of the points for labels
indication whether order of the points matters (meaning whether line from A to B is equivalent to line from B to A); default is
TRUE
The function returns a spatial data frame of four columns: ID of starting point, ID of ending point, label (names of the two points, separated by a dash) and a geometry column of type LINESTRING
; the geometry will be in the same CRS as original points.
library(sf)
library(dplyr)
points_to_lines <- function(data, ids, names, order_matters = TRUE) {
# dataframe of combinations - based on row index
idx <- expand.grid(start = seq(1, nrow(data), 1),
end = seq(1, nrow(data), 1)) %>%
# no line with start & end being the same point
dplyr::filter(start != end) %>%
# when order doesn't matter just one direction is enough
dplyr::filter(order_matters | start > end)
# cycle over the combinations
for (i in seq_along(idx$start)) {
# line object from two points
wrk_line <- data[c(idx$start[i], idx$end[i]), ] %>%
st_coordinates() %>%
st_linestring() %>%
st_sfc()
# a single row of results dataframe
line_data <- data.frame(
start = pull(data, ids)[idx$start[i]],
end = pull(data, ids)[idx$end[i]],
label = paste(pull(data, names)[idx$start[i]],
"-",
pull(data, names)[idx$end[i]]),
geometry = wrk_line
)
# bind results rows to a single object
if (i == 1) {
res <- line_data
} else {
res <- dplyr::bind_rows(res, line_data)
} # /if - saving results
} # /for
# finalize function result
res <- sf::st_as_sf(res, crs = sf::st_crs(data))
res
} # /function
The function can be easily sourced & then used as a one liner in any script; it requires only {sf}
and {dplyr}
packages, so no cruel or unusual dependencies are involved.
The intended use case is to generate a spatial data frame of lines from a spatial data frame of points (either centroids or points-on-a-surfaces or what not) and the result then joined with actual data via one of the dplyr::*_join()
functions.
To demonstrate the use of the function I am showing links between five semi random counties in North Carolina (using the popular nc.shp
shapefile that ships with the {sf}
package, and is therefore widely available).
# Well known & much loved shapefile of NC included with sf package
nc_polygons <- sf::st_read(system.file("shape/nc.shp", package="sf"), quiet = T)
set.seed(16)
# five semi random county centroids
nc_points <- nc_polygons %>%
sf::st_centroid() %>%
slice(sample(1:nrow(nc_polygons), 5))
# function of lines from points
nc_lines <- points_to_lines(nc_points, ids = "FIPS", names = "NAME")
# a graphic overview
library(ggplot2)
ggplot() +
geom_sf(data = nc_polygons, color = "gray45", fill = NA) +
geom_sf(data = nc_lines, color = "red")
The algorithm to create lines comes in two flavors, depending on whether order matters for your use case.
In case order does matter – i.e. a line from Greene to Pender counties is different from the one from Pender to Greene – there will be nrow(data) × (nrow(data) - 1)
lines (each point is connected to every other point except itself).
In case order does not matter – i.e. once a line is drawn from Greene to Pender there will be no need to plot another in opposite direction – there will be only half as much lines required.
To pick which behavior is desirable change the value of order_matters
argument; the default is TRUE
, meaning yes, order does matter.
# when order matters >> both directions are required >> 20 rows
points_to_lines(nc_points, ids = "FIPS", names = "NAME", order_matters = T) %>%
knitr::kable()
start | end | label | geometry |
---|---|---|---|
37079 | 37141 | Greene - Pender | LINESTRING (-77.67889 35.48… |
37161 | 37141 | Rutherford - Pender | LINESTRING (-81.91787 35.39… |
37181 | 37141 | Vance - Pender | LINESTRING (-78.41127 36.36… |
37023 | 37141 | Burke - Pender | LINESTRING (-81.70216 35.74… |
37141 | 37079 | Pender - Greene | LINESTRING (-77.91628 34.52… |
37161 | 37079 | Rutherford - Greene | LINESTRING (-81.91787 35.39… |
37181 | 37079 | Vance - Greene | LINESTRING (-78.41127 36.36… |
37023 | 37079 | Burke - Greene | LINESTRING (-81.70216 35.74… |
37141 | 37161 | Pender - Rutherford | LINESTRING (-77.91628 34.52… |
37079 | 37161 | Greene - Rutherford | LINESTRING (-77.67889 35.48… |
37181 | 37161 | Vance - Rutherford | LINESTRING (-78.41127 36.36… |
37023 | 37161 | Burke - Rutherford | LINESTRING (-81.70216 35.74… |
37141 | 37181 | Pender - Vance | LINESTRING (-77.91628 34.52… |
37079 | 37181 | Greene - Vance | LINESTRING (-77.67889 35.48… |
37161 | 37181 | Rutherford - Vance | LINESTRING (-81.91787 35.39… |
37023 | 37181 | Burke - Vance | LINESTRING (-81.70216 35.74… |
37141 | 37023 | Pender - Burke | LINESTRING (-77.91628 34.52… |
37079 | 37023 | Greene - Burke | LINESTRING (-77.67889 35.48… |
37161 | 37023 | Rutherford - Burke | LINESTRING (-81.91787 35.39… |
37181 | 37023 | Vance - Burke | LINESTRING (-78.41127 36.36… |
# if directions don't matter >> a single direction is enough >> 10 rows only
points_to_lines(nc_points, ids = "FIPS", names = "NAME", order_matters = F) %>%
knitr::kable()
start | end | label | geometry |
---|---|---|---|
37079 | 37141 | Greene - Pender | LINESTRING (-77.67889 35.48… |
37161 | 37141 | Rutherford - Pender | LINESTRING (-81.91787 35.39… |
37181 | 37141 | Vance - Pender | LINESTRING (-78.41127 36.36… |
37023 | 37141 | Burke - Pender | LINESTRING (-81.70216 35.74… |
37161 | 37079 | Rutherford - Greene | LINESTRING (-81.91787 35.39… |
37181 | 37079 | Vance - Greene | LINESTRING (-78.41127 36.36… |
37023 | 37079 | Burke - Greene | LINESTRING (-81.70216 35.74… |
37181 | 37161 | Vance - Rutherford | LINESTRING (-78.41127 36.36… |
37023 | 37161 | Burke - Rutherford | LINESTRING (-81.70216 35.74… |
37023 | 37181 | Burke - Vance | LINESTRING (-81.70216 35.74… |