Introduction to bivariateLeaflet

Introduction

The bivariateLeaflet package provides tools for creating interactive bivariate choropleth maps using Leaflet. Bivariate choropleth maps allow you to visualize the relationship between two variables simultaneously across geographic regions.

Statement of need

The bivariateLeaflet package addresses a need in geospatial data visualization by providing a tool to create interactive bivariate choropleth maps using Leaflet. These maps, which enable the simultaneous visualization of relationships between two variables across geographic regions, are powerful tools for analyzing complex datasets. Despite their potential, creating bivariate maps has historically been a challenging task, requiring both technical expertise and substantial time investment.

This package simplifies the process, making bivariate mapping more accessible to users of R. It is particularly valuable for researchers and analysts working in fields such as demography, justice, environmental studies, and public health. For instance, bivariateLeaflet allows users to explore the relationship between income and education, analyze correlations between temperature and precipitation, or visualize connections between healthcare access and health outcomes. By integrating seamlessly with the spatial data format, sf, and offering an intuitive interface, the package enables users to generate interactive maps that are both visually compelling and informative.

The package’s functionality includes the ability to create bivariate maps with customizable color schemes, handle data challenges such as missing values and outliers, and provide clear, interpretable legends for effective communication. Through its integration with Leaflet, it ensures that the resulting visualizations are interactive and web-ready, enabling users to share insights widely.

Installation

You can install the package from CRAN:

install.packages("bivariateLeaflet")

Or install the development version from GitHub:

# install.packages("devtools")
devtools::install_github("maduprey/bivariateLeaflet")

Data Preparation

The package works with spatial data from various sources. We’ll demonstrate using census data at different geographic levels.

Census Tract Level Data

library(tidycensus)
library(tidyr)
library(dplyr)
library(sf)

# Get census API key if you haven't already
# census_api_key("YOUR_KEY_HERE")

# Get ACS data for DC census tracts
tract_data <- get_acs(
  geography = "tract", 
  variables = c(
    "B01003_001", # Total population
    "B19013_001"  # Median household income
  ),
  state = "DC", 
  year = 2020,
  geometry = TRUE
)

# Pivot data to wide format
tract_data_wide <- tract_data %>%
  select(-moe) %>% 
  pivot_wider(
    names_from = variable,      
    values_from = estimate      
  )

County Level Data

# Get county-level data for entire US
county_data <- get_acs(
  geography = "county",
  variables = c(
    "B01003_001", # Total population
    "B19013_001"  # Median household income
  ),
  year = 2020,
  geometry = TRUE
)

county_data_wide <- county_data %>%
  select(-moe) %>%
  pivot_wider(
    names_from = variable,
    values_from = estimate
  )

Block Group Level Data

# Get block group data for DC
blockgroup_data <- get_acs(
  geography = "block group",
  variables = c(
    "B01003_001", # Total population
    "B19013_001"  # Median household income
  ),
  state = "DC",
  year = 2020,
  geometry = TRUE
)

blockgroup_data_wide <- blockgroup_data %>%
  select(-moe) %>%
  pivot_wider(
    names_from = variable,
    values_from = estimate
  )

Basic Usage

Let’s start with creating a basic bivariate choropleth map using the DC census tract data:

library(bivariateLeaflet)

# Create basic map
tract_map <- create_bivariate_map(
  data = tract_data_wide,
  var_1 = "B01003_001",    # Total population
  var_2 = "B19013_001"     # Median household income
)

# Display the map
tract_map

Understanding the Default Color Scheme

The default color scheme uses a 3x3 matrix where:

Darker blues indicate higher values in both variables
Lighter colors indicate lower values
The diagonal represents areas where both variables have similar relative values

# Display the default color matrix
create_default_color_matrix()
#>      [,1]      [,2]      [,3]     
#> [1,] "#64acbe" "#627f8c" "#574249"
#> [2,] "#b0d5df" "#ad9ea5" "#985356"
#> [3,] "#e8e8e8" "#e4acac" "#c85a5a"

Advanced Usage

Custom Color Schemes

sequential_colors <- matrix(c(
  "#49006a", "#2d004b", "#1a0027",
  "#8c96c6", "#8856a7", "#810f7c",
  "#edf8fb", "#bfd3e6", "#9ebcda"
), nrow = 3, byrow = TRUE)

sequential_map <- create_bivariate_map(
  data = tract_data_wide,
  var_1 = "B01003_001",
  var_2 = "B19013_001",
  color_matrix = sequential_colors
)

sequential_map

Handling Missing Data

Census data often includes missing values (NA) for various reasons. Here’s how to handle them:

# Identify tracts with missing data
missing_data <- tract_data_wide %>%
  mutate(
    missing_pop = is.na(B01003_001),
    missing_income = is.na(B19013_001)
  )

# Create a map excluding missing data
clean_map <- create_bivariate_map(
  data = missing_data %>% filter(!missing_pop & !missing_income),
  var_1 = "B01003_001",
  var_2 = "B19013_001"
)

clean_map

Working with Different Geographic Levels

County Level Map

# Create a map of US counties
county_map <- create_bivariate_map(
  data = county_data_wide,
  var_1 = "B01003_001",
  var_2 = "B19013_001"
)

county_map

Block Group Level Map

# Create a detailed block group map
blockgroup_map <- create_bivariate_map(
  data = blockgroup_data_wide,
  var_1 = "B01003_001",
  var_2 = "B19013_001"
)

blockgroup_map

Custom Labels

You can create custom tooltips for your map by providing your own labels:

# Create custom labels with tract names and formatted values
custom_labels <- sprintf(
  "<strong>Census Tract:</strong> %s<br/>
   <strong>Population:</strong> %s<br/>
   <strong>Median Income:</strong> $%s",
  tract_data_wide$NAME,
  format(tract_data_wide$B01003_001, big.mark = ","),
  format(tract_data_wide$B19013_001, big.mark = ",")
) 

# Create map with custom labels
map_custom_labels <- create_bivariate_map(
  data = tract_data_wide,
  var_1 = "B01003_001",
  var_2 = "B19013_001",
  custom_labels = custom_labels
)

map_custom_labels

This will create tooltips that show:

The census tract name
Population with thousands separator
Formatted median income with dollar sign and thousands separator

Best Practices

When creating bivariate choropleth maps:

Variable Selection
- Choose variables that have a meaningful relationship
- Consider whether the variables are independent
- Think about the story you want to tell
Data Distribution
- Check the distribution of your variables before mapping
- Consider transforming highly skewed data
- Be aware of outliers that might affect the visualization
Color Schemes
- Use colorblind-friendly palettes
- Ensure sufficient contrast between categories
- Consider your audience when choosing colors
Scale and Geography
- Choose an appropriate geographic level for your analysis
- Consider the modifiable areal unit problem (MAUP)
- Be consistent with projections
Legend and Labels
- Provide clear, descriptive labels
- Include units of measurement
- Explain how to interpret the bivariate color scheme

Error Handling

The package includes various checks and warnings:

# Missing variable
try(calculate_tertiles(data.frame(x = 1:5), "nonexistent", "also_nonexistent"))
#> Error in calculate_tertiles(data.frame(x = 1:5), "nonexistent", "also_nonexistent") : 
#>   One or both of the specified variables are not in the dataset.

# Too few unique values
test_data <- data.frame(
  var1 = c(1,1,1,2),
  var2 = c(1,2,3,4)
)
calculate_tertiles(test_data, "var1", "var2")
#>   var1 var2 var_1_tertile var_2_tertile
#> 1    1    1             1             1
#> 2    1    2             1             1
#> 3    1    3             2             2
#> 4    2    4             3             3

Common Issues and Solutions

Handling Extreme Outliers

# Identify outliers using IQR method
outlier_data <- tract_data_wide %>%
  mutate(
    pop_outlier = B01003_001 > quantile(B01003_001, 0.75, na.rm = TRUE) + 
      1.5 * IQR(B01003_001, na.rm = TRUE),
    income_outlier = B19013_001 > quantile(B19013_001, 0.75, na.rm = TRUE) + 
      1.5 * IQR(B19013_001, na.rm = TRUE)
  )

# Create map excluding outliers
no_outliers_map <- create_bivariate_map(
  data = outlier_data %>% 
    filter(!pop_outlier & !income_outlier),
  var_1 = "B01003_001",
  var_2 = "B19013_001"
)

Working with Different Projections

# Reproject data if needed
reprojected_data <- tract_data_wide %>%
  st_transform(4326)  # WGS 84

# Create map with reprojected data
projected_map <- create_bivariate_map(
  data = reprojected_data,
  var_1 = "B01003_001",
  var_2 = "B19013_001"
)

Acknowledgements

The development of this package was funded by Grant 2020-R2-CX-0027 from the National Institute of Justice, Office of Justice Programs, U.S. Department of Justice. The opinions, findings, and conclusions or recommendations expressed are those of the authors and do not necessarily reflect those of the U.S. Department of Justice.