This document intended to get you up to speed with using
`rsgeo`

and comfortable branching out from using sf more
generally.

## Understanding `sf`

Most R users that do geospatial analysis are familiar with sf and
understand spatial workflows in that context only. Many users of
`sf`

view it as this magical data frame that lets you do
spatial analysis—and that is exactly how it feels! This means that sf
has done a great job in making spatial analysis feel a lot easier for
the vast majority of R users.

But `sf`

needs to be understood in a more fundamental way.
`sf`

is named after the Simple Feature Access Standard.
Simple features are an agreed upon way to represent “geometric
primitives”—things like points, linestrings, polygons, and their multi-
types. The `sf`

package builds a hierarchy off of these. We
have `sfg`

, `sfc`

, and `sf`

objects.

###
`sfg`

objects

At the core is the `sfg`

class which are representations
of simple feature geometries. `sfg`

class objects are
representations of a simple feature. They are a single geometry at a
time. We can think of them as a scalar value (even though R does not
have the concept of a scalar).

```
library(sf)
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
# create a point
pnt <- st_point(c(0, 10))
# create a line
ln <- st_linestring(matrix(c(0, 1, 0, 0), ncol = 2))
class(pnt)
#> [1] "XY" "POINT" "sfg"
class(ln)
#> [1] "XY" "LINESTRING" "sfg"
```

Each scalar value can be used independently which is useful in itself.

```
st_length(ln)
#> [1] 1
```

`sfg`

objects are very simple objects constructed of
numeric vectors, matrices, and lists of matrices. They also have no
sense of a coordinate reference system (CRS).

But more often we want to have a vector of geometries. A vector of
geometries is stored in an `sfc`

object.

###
`sfc`

objects

`sfc`

is short for simple feature column. Under the hood,
this is just a list of `sfg`

objects. You might think that
you could create a vector of geometries by combining them using
`c()`

but that would be wrong.

```
c(pnt, pnt)
#> MULTIPOINT ((0 10), (0 10))
c(pnt, ln)
#> GEOMETRYCOLLECTION (POINT (0 10), LINESTRING (0 0, 1 0))
```

`sfg`

objects behave like scalars so combining them
creates either a multi- type of the `sfg`

or a geometry
collection (another type of simple feature).

To create a vector of geometries you must use `st_sfc()`

.
`st_sfc()`

is the construct for creating a “simple feature
geometry list column.” It lets you also set a number of attributes that
are associated with the vector.

```
st_sfc(pnt, pnt)
#> Geometry set for 2 features
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 0 ymin: 10 xmax: 0 ymax: 10
#> CRS: NA
#> POINT (0 10)
#> POINT (0 10)
```

`sfc`

objects have attributes such as a CRS, bounding box,
and precision. `sfc`

objects behave more like the vectors
that you are familiar with .

```
c(
st_sfc(pnt, pnt),
st_sfc(pnt),
st_sfc(ln)
)
#> Geometry set for 4 features
#> Geometry type: GEOMETRY
#> Dimension: XY
#> Bounding box: xmin: 0 ymin: 0 xmax: 1 ymax: 10
#> CRS: NA
#> POINT (0 10)
#> POINT (0 10)
#> POINT (0 10)
#> LINESTRING (0 0, 1 0)
```

Since `sfc`

objects are vectors, they can be included as a
column in a data frame.

```
df <- data.frame(
geo = st_sfc(pnt, pnt)
)
df
#> geometry
#> 1 POINT (0 10)
#> 2 POINT (0 10)
class(df)
#> [1] "data.frame"
```

Having geometry included in a data frame is a huge win for the R community because this means they can included attributes along with the geometries.

For example we can create a vector of points from the `x`

and `y`

columns from the `diamonds`

dataset in
`ggplot2`

.

```
data(diamonds, package = "ggplot2")
pnts <- purrr::map2(
diamonds$x,
diamonds$y,
function(.x, .y) st_point(c(.x, .y))
) |>
st_sfc()
```

We can included this in a data frame

```
library(dplyr, warn.conflicts = FALSE)
dmnd <- diamonds |>
select(price, clarity) |>
bind_cols(geometry = pnts)
head(dmnd)
#> # A tibble: 6 × 3
#> price clarity geometry
#> <int> <ord> <POINT>
#> 1 326 SI2 (3.95 3.98)
#> 2 326 SI1 (3.89 3.84)
#> 3 327 VS1 (4.05 4.07)
#> 4 334 VS2 (4.2 4.23)
#> 5 335 SI2 (4.34 4.35)
#> 6 336 VVS2 (3.94 3.96)
```

Now we have a tibble with price, clarity, and geometry! Huge win! What if we wanted to calculate the average price by clarity and keep the geometries?

```
dmnd |>
group_by(clarity) |>
summarise(avg_price = mean(price))
#> # A tibble: 8 × 2
#> clarity avg_price
#> <ord> <dbl>
#> 1 I1 3924.
#> 2 SI2 5063.
#> 3 SI1 3996.
#> 4 VS2 3925.
#> 5 VS1 3839.
#> 6 VVS2 3284.
#> 7 VVS1 2523.
#> 8 IF 2865.
```

Well, we lose the geometry just like we would for any other column
that wasn’t included in the `summarise()`

call. To keep it,
we would need to perform an operation on the geometry itself.

```
dmnd |>
group_by(clarity) |>
summarise(
avg_price = mean(price),
geometry = st_union(geometry)
)
#> # A tibble: 8 × 3
#> clarity avg_price geometry
#> <ord> <dbl> <MULTIPOINT>
#> 1 I1 3924. ((4.33 4.29), (4.33 4.36), (4.36 4.33), (4.38 4.42), (4.39 …
#> 2 SI2 5063. ((0 0), (0 6.62), (3.79 3.75), (3.84 3.82), (3.87 3.85), (3…
#> 3 SI1 3996. ((3.88 3.84), (3.89 3.84), (3.9 3.85), (3.93 3.96), (3.95 3…
#> 4 VS2 3925. ((0 0), (3.73 3.68), (3.73 3.71), (3.74 3.71), (3.76 3.73),…
#> 5 VS1 3839. ((0 0), (3.83 3.85), (3.84 3.87), (3.86 3.89), (3.88 3.9), …
#> 6 VVS2 3284. ((3.83 3.86), (3.85 3.89), (3.85 3.9), (3.85 3.91), (3.86 3…
#> 7 VVS1 2523. ((0 0), (3.83 3.85), (3.87 3.9), (3.88 3.95), (3.88 3.99), …
#> 8 IF 2865. ((3.86 3.88), (3.89 3.9), (3.91 3.95), (3.92 3.94), (3.93 3…
```

Well, wouldn’t it be nice if the geometry knew to do that? Well, that
is exactly what `sf`

objects are.

##
`sf`

objects

`sf`

objects are simply just data frames with a geometry
column that is sticky and smart. We can create an `sf`

object
if an `sfc`

column is present in a data frame by using
`st_as_sf()`

.

`dmnd_sf <- st_as_sf(dmnd)`

Doing this creates an object of class `sf`

. The two things
that make an `sf`

object so special are the class
`sf`

and the the attribute `sf_column`

.

```
attr(dmnd_sf, "sf_column")
#> [1] "geometry"
```

This attribute tells us which column is the geometry. Because we have
this `sf`

can implement its own methods for common functions
like `select()`

, `mutate()`

,
`aggregate()`

, `group_by()`

, etc which always keep
the `attr(x, "sf_column")`

attached to the data frame.

Having the class and attribute allow methods like
`summarise()`

to be written for the class itself and handle
the things like unioning geometry for us.

```
dmnd_sf |>
group_by(clarity) |>
summarise(avg_price = mean(price))
#> Simple feature collection with 8 features and 2 fields
#> Geometry type: MULTIPOINT
#> Dimension: XY
#> Bounding box: xmin: 0 ymin: 0 xmax: 10.74 ymax: 58.9
#> CRS: NA
#> # A tibble: 8 × 3
#> clarity avg_price geometry
#> <ord> <dbl> <MULTIPOINT>
#> 1 I1 3924. ((4.33 4.29), (4.33 4.36), (4.36 4.33), (4.38 4.42), (4.39 …
#> 2 SI2 5063. ((0 0), (0 6.62), (3.79 3.75), (3.84 3.82), (3.87 3.85), (3…
#> 3 SI1 3996. ((3.88 3.84), (3.89 3.84), (3.9 3.85), (3.93 3.96), (3.95 3…
#> 4 VS2 3925. ((0 0), (3.73 3.68), (3.73 3.71), (3.74 3.71), (3.76 3.73),…
#> 5 VS1 3839. ((0 0), (3.83 3.85), (3.84 3.87), (3.86 3.89), (3.88 3.9), …
#> 6 VVS2 3284. ((3.83 3.86), (3.85 3.89), (3.85 3.9), (3.85 3.91), (3.86 3…
#> 7 VVS1 2523. ((0 0), (3.83 3.85), (3.87 3.9), (3.88 3.95), (3.88 3.99), …
#> 8 IF 2865. ((3.86 3.88), (3.89 3.9), (3.91 3.95), (3.92 3.94), (3.93 3…
```

Note that this is just like what we wrote earlier except that we
didn’t have to handle the geometry column manually. If we compare these
two approaches and ignore attribute difference like
`sf_column`

and `class`

we can see that they are
identical.

## Freeing yourself

Knowing how `sf`

works can be helpful for understanding
how we can ease our reliance on it for all geospatial operations.
Instead of thinking of the entire sf data frame as the thing that
handles all geometry operations we now know that it is the
`sfc`

geometry column.

In R there are different ways of representing geometry besides
`sfc`

vectors. The packages `s2`

,
`geos`

, `wk`

, and `rsgeo`

all provide
different vectors of geometries that can be used.

These libraries are very handy for doing geometric operations. Each
of these packages tend to be better at one thing than another and each
have their place. You will often find big speed improvements if you use
these libraries instead of `sf`

for certain things.

Take for example calculating the length of linestrings on a geodesic.
We can use the `roxel`

dataset from
sfnetworks.

`data(roxel, package = "sfnetworks")`

To illustrate we can extract the geometry column and cast it as an
`rsgeo`

class object.

We can then use the functions `st_length()`

and
`length_haversine()`

respectively to compute the length of
linestrings.

```
bench::mark(
st_length(geo),
length_haversine(rs),
check = FALSE
)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 st_length(geo) 2.28ms 2.37ms 409. 2.89MB 17.1
#> 2 length_haversine(rs) 322.55µs 756.37µs 1301. 8.8KB 0
```

This is markedly faster when using rsgeo. We also do not have to extract the vector and work with it as its own object. Any vector can be included in a data frame, remember?

```
roxel |>
as_tibble() |>
mutate(
geometry = as_rsgeo(geometry),
length = length_haversine(geometry)
)
#> # A tibble: 851 × 4
#> name type
#> <chr> <fct>
#> 1 Havixbecker Strasse residential
#> 2 Pienersallee secondary
#> 3 Schulte-Bernd-Strasse residential
#> 4 NA path
#> 5 Welsingheide residential
#> 6 NA footway
#> 7 NA footway
#> 8 NA path
#> 9 NA track
#> 10 NA track
#> # ℹ 841 more rows
#> # ℹ 2 more variables: geometry <r_LINEST>, length <dbl>
```