First load some packages needed for fitting models and producing plots.
Next load the data for the case study. The data are distributed with
spotr
and consists of 20 years of monitoring data of the
common cuckoo in Sweden.
data(cuckoo)
head(cuckoo)
## count yr route county lon lat
## 1 4 2000 SFTstd:siteId:1 Skåne 13.2294 55.5649
## 2 2 2001 SFTstd:siteId:1 Skåne 13.2294 55.5649
## 3 3 2002 SFTstd:siteId:1 Skåne 13.2294 55.5649
## 4 1 2003 SFTstd:siteId:1 Skåne 13.2294 55.5649
## 5 4 2005 SFTstd:siteId:1 Skåne 13.2294 55.5649
## 6 10 2006 SFTstd:siteId:1 Skåne 13.2294 55.5649
Transform the data into an sf
object for plotting and
for spatial manipulation. Also load the map of Sweden contained in the
package.
cuckoo = st_as_sf(cuckoo, coords = c("lon", "lat"), crs = st_crs(4326), remove = FALSE)
data(swe_map)
Plot the data for an example year
To compute indices a model that can generate predictions of
abundances across time and space is needed. For example, a tensor
product of a temporal and a spatial smooth can be fit with
mgcv
:
A data frame containing the spatial locations as well as any variables needed for predicting from the model is now needed. If there are no spatio-temporal explanatory variables, as in the model above, the data frame may be defined from the combination of a data frame containing a spatial grid and a data frame containing temporal variables.
First define a spatial grid.
spat_grid = st_make_grid(cuckoo, cellsize = .5, what = "centers") |> st_sf() |>
st_intersection(swe_map)
# Add columns with longitude and latitude
spat_grid[, c("lon", "lat")] = st_coordinates(spat_grid)
ggplot(swe_map) + geom_sf() + geom_sf(data = spat_grid)
The grid now contains at least one point for each county in the map, which will be important later when producing county-wise indices.
To create a spatio-temporal grid we combine the spatial grid with a
temporal grid containing the one temporal variable yr
.
This results in a prediction grid that is balanced across space and time, i.e. each time point occurs the same number of times in the grid, as does each spatial point.
The spatio-temporal grid can now be used to compute an index for the total Swedish population. The result is a data frame with an index for each year and corresponding uncertainty estimates.
swe_ind = index(gam_fit, timevar = "yr", newdata = pred_points)
head(swe_ind)
## yr index ci_2.5 ci_10 ci_90 ci_97.5 se se_log
## 1 2000 1.000000 1.0000000 1.0000000 1.000000 1.000000 0.00000000 0.00000000
## 2 2001 1.029516 0.9805629 0.9948450 1.056293 1.071458 0.02355728 0.02297323
## 3 2002 1.055186 0.9676306 0.9922504 1.108208 1.135164 0.04388539 0.04180741
## 4 2003 1.071392 0.9578370 0.9938381 1.140877 1.177611 0.05650488 0.05302884
## 5 2004 1.075826 0.9495374 0.9918479 1.152063 1.190805 0.06070837 0.05677877
## 6 2005 1.073879 0.9471606 0.9890586 1.151704 1.189762 0.06151820 0.05769944
## baseline
## 1 808.4534
## 2 808.4534
## 3 808.4534
## 4 808.4534
## 5 808.4534
## 6 808.4534
swe_ind |>
ggplot(aes(x = yr, y = index)) +
geom_linerange(aes(ymin = ci_2.5, ymax = ci_97.5), col = "gray50") +
geom_linerange(aes(ymin = ci_10, ymax = ci_90), col = "red3", lwd =1) +
geom_point() + ylim(.8,1.2)
Confidence intervals for the index are fairly wide. Part of the
reason for this may be that it is hard to estimate population size in
the first year when few sites were surveyed. One can instead use the
mean abundance over a sequence of years as the baseline using the
baseline
argument to index
:
index(gam_fit, timevar = "yr", newdata = pred_points, baseline = 2000:2010) |>
ggplot(aes(x = yr, y = index)) +
geom_linerange(aes(ymin = ci_2.5, ymax = ci_97.5), col = "gray50") +
geom_linerange(aes(ymin = ci_10, ymax = ci_90), col = "red3", lwd =1) +
geom_point() + ylim(.8,1.2)
This results in lower uncertainty because the mean over 2000 - 2010 can be more accurately estimated.
Instead of estimating an index relative to a baseline year or period,
the growth rate between consecutive years can be estimated by setting
type = "delta"
.
index(gam_fit, timevar = "yr", newdata = pred_points, type = "delta") |>
ggplot(aes(x = yr, y = index)) +
geom_linerange(aes(ymin = ci_2.5, ymax = ci_97.5), col = "gray50") +
geom_linerange(aes(ymin = ci_10, ymax = ci_90), col = "red3", lwd =1) +
geom_point()
Note that the index for each year here corresponds to the estimated growth rate between that year and the previous year. E.g., the index for year 2005 is the change in abundance in year 2005 compared to 2004.
Using the byvar
argument of index
, indices
within groups can be computed. This can be used for example to compute
indices for each county:
index(gam_fit, timevar = "yr", byvar = "county", newdata = pred_points, baseline = 2000:2010) |>
ggplot(aes(x = yr, y = index, ymin = ci_2.5, ymax = ci_97.5)) +
geom_linerange(aes(ymin = ci_2.5, ymax = ci_97.5), col = "gray50") +
geom_linerange(aes(ymin = ci_10, ymax = ci_90), col = "red3", lwd = 1) +
geom_point(size=.5) +
scale_x_continuous(breaks = c(2005,2015)) + facet_wrap(vars(county))
The indices above are scaled relative to the mean across 2000-2010
within each county. This means that information about relative
abundance between counties is lost. To display such differences
the argument type
can be set to "global"
. The
indices then are computed by aggregation of abundances within each
county in the numerator, but aggregating across the whole grid in the
denominator. The indices then represent the abundance in the counties
relative to the total abundance in the baseline period.
index(gam_fit, timevar = "yr", byvar = "county", newdata = pred_points, baseline = 2000:2010, type = "global") |>
ggplot(aes(x = yr, y = index, ymin = ci_2.5, ymax = ci_97.5)) +
geom_linerange(aes(ymin = ci_2.5, ymax = ci_97.5), col = "gray50") +
geom_linerange(aes(ymin = ci_10, ymax = ci_90), col = "red3", lwd = 1) +
geom_point(size=.5) +
scale_x_continuous(breaks = c(2005,2015)) + facet_wrap(vars(county))
The results here suggest that the northernmost county, Norrbotten, which
is also the largest county, contains 20-30% of the Swedish population
(with the period 2000-2010 as the reference).
The above indices clearly depend on the size of the county. It could
be of interest to instead compare population density across counties and
time. This can be done by supplying weights to the index
function. Two sets of weights can be specified, one for the numerator
(argument weights
) and one for the denominator (argument
bweights
). For the numerator, we use the inverse of the
number of spatial grid points within each county so that the summed
weighted abundance across the county is the within county density. For
the denominator we use the inverse of the total number of spatial grid
points times the number time points as the weights, so that the
reference is the averaged density across the whole of Sweden in the
baseline period.
# Count number of predictions points per county
pred_points = pred_points |> group_by(county, yr) |> mutate(ppc = n())
w = 1/pred_points$ppc
# Denominator weights
bw = 1/(nrow(spat_grid) * length(2000:2010))
index(gam_fit, timevar = "yr", byvar = "county", newdata = pred_points, weights = w, bweights = bw, baseline = 2000:2010, type = "global") |>
ggplot(aes(x = yr, y = index, ymin = ci_2.5, ymax = ci_97.5)) +
geom_linerange(aes(ymin = ci_2.5, ymax = ci_97.5), col = "gray50") +
geom_linerange(aes(ymin = ci_10, ymax = ci_90), col = "red3", lwd = 1) +
geom_point(size=.5) +
scale_x_continuous(breaks = c(2005,2015)) + facet_wrap(vars(county))
This suggests e.g. that the density in Kalmar is around 50% larger then
the national density, and that the density in Västra Götaland is around
half of the national density.