Geographic Data Collection and Spatial data

KoboToolbox allows the collection of spatial data through three questions types: geopoint, geotrace and geoshape.

Geopoint:
The geopoint question type captures a single geographic coordinate (latitude and longitude) including altitude and accuracy. This is useful for marking locations, such as homes, schools, or water sources.

Geotrace:
The geotrace question type collects a series of connected geographic coordinates, forming a line. This can be used to map routes, paths, or boundaries.

Geoshape:
A geoshape question type captures a series of geographic coordinates that form a closed polygon. This is useful for defining areas, such as land parcels, agricultural fields, or protected zones.

To utilize these data types, we need to parse them into a GIS friendly format. robotoolbox uses Well-Known Text (WKT), a standard markup language for representing vector geometry, to represent points (geopoint), lines (geotrace) and polygons (geoshape). WKT is chosen for its wide compatibility with GIS software and spatial analysis packages, making it easier to integrate KoboToolbox data with various spatial analysis workflows.

Spatial data

The following form provides a simple demonstration of how robotoolbox maps spatial field types.

Survey questions

name type label
point geopoint Record a location
point_description text Describe the recorded location
line geotrace Record a line
line_description text Describe the recorded line
polygon geoshape Record a polygon
polygon_description text Describe the recorded polygon

The form includes three spatial type columns: point, line and polygon.

Loading the project

The aforementioned form, named Spatial data, was uploaded to the server. You can load it from the asset_list of assets.

library(robotoolbox)
library(dplyr)

# Retrieve a list of all assets (projects) from your KoboToolbox server
asset_list <- kobo_asset_list()

# Filter the asset list to find the specific project and get its unique identifier (uid)
uid <- filter(asset_list, name == "Spatial data") |>
  pull(uid)

# Load the specific asset (project) using its uid
asset <- kobo_asset(uid)
asset
#> <robotoolbox asset>  a9NCKTJxBPKdy49gX57WL5 
#>   Asset name: Spatial data
#>   Asset type: survey
#>   Asset owner: dickoa
#>   Created: 2023-04-22 11:57:54
#>   Last modified: 2023-04-22 12:01:39
#>   Submissions: 1

In this code:

We have a single submission, where we recorded one location using a geopoint question, mapped a portion of a road using a geotrace question, and outlined a stadium using a geoshape question.

Extracting the data

From the assets, we can proceed to extract the submissions.

df <- kobo_data(asset)
glimpse(df)
#> Rows: 1
#> Columns: 24
#> $ point                <chr> "14.719783 -17.459261 0 0"
#> $ point_latitude       <dbl> 14.71978
#> $ point_longitude      <dbl> -17.45926
#> $ point_altitude       <dbl> 0
#> $ point_precision      <dbl> 0
#> $ point_wkt            <chr> "POINT (-17.459261 14.719783 0)"
#> $ point_description    <chr> "Jardin Liberte"
#> $ line                 <chr> "14.726129 -17.500409 0 0;14.726253 -17.498993 0 …
#> $ line_wkt             <chr> "LINESTRING (-17.500409 14.726129 0, -17.498993 1…
#> $ line_description     <chr> "Route de la Corniche"
#> $ polygon              <chr> "14.747328 -17.452461 0 0;14.747743 -17.451869 0 …
#> $ polygon_wkt          <chr> "POLYGON ((-17.452461 14.747328 0, -17.451869 14.…
#> $ polygon_description  <chr> "Stade Leopold Sedar Senghor"
#> $ `_id`                <int> 28557821
#> $ uuid                 <chr> "01c7d7250bd84ac9b604199ca98daa84"
#> $ `__version__`        <chr> "v7nQkzvEV64YLAfEQv5prV"
#> $ instanceID           <chr> "uuid:26c66ec5-935a-4220-8902-6de928330122"
#> $ `_xform_id_string`   <chr> "a9NCKTJxBPKdy49gX57WL5"
#> $ `_uuid`              <chr> "26c66ec5-935a-4220-8902-6de928330122"
#> $ `_status`            <chr> "submitted_via_web"
#> $ `_submission_time`   <dttm> 2023-04-22 12:07:29
#> $ `_validation_status` <chr> NA
#> $ `_submitted_by`      <lgl> NA
#> $ `_attachments`       <list> <NULL>

We can see that we have all of our three columns point, line and polygon. For each of them, we have a corresponding WKT column.

The WKT format for a point is simply POINT (longitude latitude). For example, POINT (-17.446667 14.692778) represents a location in Dakar, Senegal.

pull(df, point)
#> [1] "14.719783 -17.459261 0 0"
#> attr(,"label")
#> [1] "Record a location"
pull(df, point_wkt)
#> [1] "POINT (-17.459261 14.719783 0)"
#> attr(,"label")
#> [1] "point_wkt"

For geopoint types, robotoolbox also offers columns for latitude, longitude, altitude, and precision.

df |>
  select(starts_with("point_"))
#> # A tibble: 1 × 6
#>   point_latitude point_longitude point_altitude point_precision point_wkt       
#>            <dbl>           <dbl>          <dbl>           <dbl> <chr>           
#> 1           14.7           -17.5              0               0 POINT (-17.4592…
#> # ℹ 1 more variable: point_description <chr>

The line column, derived from the geotrace question, has a corresponding line_wkt column.

The WKT format for a line is LINESTRING (lon1 lat1, lon2 lat2, ...). Each pair of coordinates represents a point along the line. For example, LINESTRING (-17.4440 14.6937, -17.4502 14.7167) represents a line connecting two points in Dakar.

pull(df, line)
#> [1] "14.726129 -17.500409 0 0;14.726253 -17.498993 0 0;14.725688 -17.498002 0 0;14.72527 -17.497068 0 0;14.724897 -17.496113 0 0;14.72438 -17.495383 0 0;14.723737 -17.494784 0 0"
#> attr(,"label")
#> [1] "Record a line"
pull(df, line_wkt)
#> [1] "LINESTRING (-17.500409 14.726129 0, -17.498993 14.726253 0, -17.498002 14.725688 0, -17.497068 14.72527 0, -17.496113 14.724897 0, -17.495383 14.72438 0, -17.494784 14.723737 0)"
#> attr(,"label")
#> [1] "line_wkt"

Lastly, polygon_wkt is the WKT column derived from the geoshape question labeled polygon.

The WKT format for a polygon is POLYGON ((lon1 lat1, lon2 lat2, ..., lon1 lat1)). Note that the first and last coordinate pairs are the same, closing the polygon. For example, POLYGON ((-17.4440 14.6937, -17.4502 14.7167, -17.4314 14.7145, -17.4440 14.6937)) represents a triangular area in Dakar.

pull(df, polygon)
#> [1] "14.747328 -17.452461 0 0;14.747743 -17.451869 0 0;14.747519 -17.451477 0 0;14.747244 -17.451332 0 0;14.746378 -17.451332 0 0;14.745989 -17.451563 0 0;14.745844 -17.451987 0 0;14.746062 -17.45232 0 0;14.74627 -17.452492 0 0;14.747328 -17.452461 0 0"
#> attr(,"label")
#> [1] "Record a polygon"
pull(df, polygon_wkt)
#> [1] "POLYGON ((-17.452461 14.747328 0, -17.451869 14.747743 0, -17.451477 14.747519 0, -17.451332 14.747244 0, -17.451332 14.746378 0, -17.451563 14.745989 0, -17.451987 14.745844 0, -17.45232 14.746062 0, -17.452492 14.74627 0, -17.452461 14.747328 0))"
#> attr(,"label")
#> [1] "polygon_wkt"

Now that we understand how robotoolbox stores spatial question types, we can convert these columns into spatial objects suitable for spatial data analysis.

Geopoint

The standard approach to manipulate spatial vector data in R involves using the sf package. sf stands for Simple Features and it extends a data.frame by adding a geometry list-column. It creates a spatially enabled data.frame. It provides an interface to the popular GDAL, GEOS, PRØJ and S2 libraries. It can be used to efficiently manipulate and visualize spatial vector data.

Creating an sf object from a text column that contains WKT characters is straightforward. The sf::st_as_sf function can be used to turn the data.frame with a WKT column into an sf object.

point_sf <- st_as_sf(data_spatial,
                     wkt = "point_wkt", crs = 4326)
mapview(point_sf)

In this code, crs = 4326 specifies the Coordinate Reference System (CRS) for the spatial data. CRS 4326 refers to the WGS84 (World Geodetic System 1984) coordinate system, which is widely used in GPS and web mapping applications. It represents locations on the Earth using latitude and longitude in degrees. This is the standard CRS used by KoboToolbox for storing geographic coordinates.

Geotrace

We can also transform a data.frame with a column from a geotrace question to an sf object with a LINESTRING geometry. The WKT column is named line_wkt.

line_sf <- st_as_sf(data_spatial,
                     wkt = "line_wkt", crs = 4326)
mapview(line_sf)

Geoshape

The column polygon_wkt can be used to create an sf polygon object. It’s a simple closed polygon.

poly_sf <- st_as_sf(data_spatial,
                    wkt = "polygon_wkt", crs = 4326)
mapview(poly_sf)

Conclusion

By combining robotoolbox with R spatial analysis tools, researchers and data analysts can efficiently process, analyze, and visualize geographic data collected through KoboToolbox, opening up a wide range of possibilities for spatial data analysis in various fields such as humanitarian work, environmental studies, and social sciences.

You can learn a lot about the sf packages and spatial data analysis with R from the excellent Geocomputation with R book and through the extensive sf package documentation.