Bivariate Choropleths with biscale package – new SI prefix in legend

Notice the k for thousands. Easily add SI prefix with the biscale package (by Chris Prener, in-review PR by Rick Pack)

Chris Prener‘s function-rich biscale R package (Github, GitHub.io) makes the production of bivariate choropleth maps much easier than prior to its development. Kyle Walker‘s tidycensus R package (Github, Story Page) makes accessing and visualizing U.S. Census data far easier. Like others, I have enjoyed marrying the two.

I am happy to announce a contribution to the biscale package that makes printing shorter labels using SI prefixes (e.g., 1,000,003 => 1M and 1,324 => 1.3k) far easier. This makes printing the legend in an attractive manner easier given width concerns, although you can tell by the picture above that I still struggle with optimal uses of multiple-plot-on-the-same-page depicting functions like the cowplot package’s draw_plot() and my current use of grid.arrange() from gridExtra. I would love for the legend and map to be centered under the title.

The new si_levels argument for bi_class_breaks() takes a logical value of TRUE or FALSE for either a single or two-unit vector, with a single unit vector causing the specified value to be applied to both the X and Y variables. This matches Prener’s convenient functionality for the number of digits function dig_lab, as he requested in the Github Issue I created for this addition. Note that si_levels rounds the input number, if appropriate, based on the digits indicated by dig_lab, which defaults to 3.

Most will probably set si_levels to TRUE for uniformity of both the X and Y variables’ labels. I was surprised to see that options(scipen = 999, digits = 22) did not eliminate the scientific notation (e.g., e+03).

For now, you will need to install the package from my Github page via devtools::install_github('RickPack/biscale'). Please test it out and let me / us know how it works for you! I have an outstanding Github Pull Request here and welcome your comments.

Change the state and county values at the top of the code below to make similar maps for other cities, saved into save_folder.

All of the maps above excluded where the population density was less than 1000 people per square mile. The st_area function from the sf package was used to calculate the area in square meters, which was then divided by 1609^2 to yield area in square miles (1609 meters in a mile).

Notice in the maps below that the caption shows varying population impacts of excluding census tracts with a population density < 1000 residents / square mile. Miami-Dade’s map shows far less of an impact visually than Wake County, NC, reflecting the differences of population not represented (3% vs. 23%).

By the way: Outputting an attractive image that features a large legend next to the map took a couple hours of trial and error. I am in good company because Timo Grossenbacher, whose bivariate choropleth mapping inspired Chris Prener to create the biscale R package, said:

The numeric arguments to draw_plot() are basically trial and error.

https://timogrossenbacher.ch/2019/04/bivariate-maps-with-ggplot2-and-sf/

Timo’s work is absolutely beautiful.

Timo Grossenbacherhttps://timogrossenbacher.ch/2019/04/bivariate-maps-with-ggplot2-and-sf/
## CODE DISCUSSED in blog post: https://rickpackblog.wordpress.com/2022/07/18/bivariate-choropleths-with-biscale-package-new-si-prefix-in-legend/
## Outputs into save_folder a bivariate choropleth .PNG depcicting a county's
## median income vs. median rent using Jenks Natural Breaks Classification
state_var <- "FL"
county_var <- "Miami-Dade County"
save_folder <- "C:/Users/rick2/Documents/Rick/blog/biscale/"
#### Census API key
## Only need this line because API key stored in .Renviron
## If not stored, get Census API key at http://api.census.gov/data/key_signup.html then run
## census_api_key("PLACE CENSUS KEY HERE", install = TRUE)
## per https://rdrr.io/cran/tidycensus/man/census_api_key.html
Sys.getenv("CENSUS_API_KEY")
## remotes::install_github("RickPack/biscale")
## Notice need for Rick Pack's development version of Chris Prener's biscale R package
library(biscale)
library(tidycensus)
library(ggplot2)
library(cowplot)
library(grid)
library(gridExtra)
library(sf)
library(dplyr)
library(scales)
# census_vars <- tidycensus::load_variables(year = 2020, dataset = "acs5")
# used commented out census_vars to identify
# Census API variable names like B02001_001 for get_acs()
df_rent_income <- tidycensus::get_acs(
geography = "tract", year = 2020,
variables = c(
Total_Population = "B02001_001",
Median_Income = "B19013_001",
Median_Rent = "B25031_001"
),
state = state_var, county = county_var,
geometry = TRUE, cache = TRUE,
output = "wide")
df_rent_income <- df_rent_income %>%
rename(Median_Income = Median_IncomeE,
Median_Rent = Median_RentE)
df_rent_income <- df_rent_income %>%
mutate(area_m2 = st_area(geometry))
df_rent_income$population_density <-
round(
df_rent_income$Total_PopulationE /
# convert square meters to square miles
(as.numeric(df_rent_income$area_m2) / 1609**2),
2)
## Large census tracts in places like Miami consume a great deal of space while
## having relatively few people
df_rent_income2 <- df_rent_income %>%
dplyr::filter(population_density >= 1000)
total_pop <- sum(df_rent_income$Total_PopulationE)
eliminated_low_density_pop <- sum(
df_rent_income %>%
dplyr::filter(population_density <= 1000) %>%
pull(Total_PopulationE), na.rm = TRUE)
eliminated_low_density_popC <- paste0(
eliminated_low_density_pop,
" residents (",
scales::percent(eliminated_low_density_pop/ total_pop),
")")
# Still have census tract of 14022531 [m^2]
max(df_rent_income2$area_m2)
# Warning is okay. Not all Census Tracts are associated with median income or rent
# "var has missing values, omitted in finding classes"
df_rent_income_bivar <-
biscale::bi_class(
df_rent_income2,
x = Median_Rent,
y = Median_Income, style = "quantile",
dim = 3, dig_lab = c(3,3))
# shows new si_levels argument for SI prefix (and rounding)
# be sure to use the same arguments otherwise as in bi_class()
labels1 <- biscale::bi_class_breaks(
df_rent_income2,
x = Median_Rent,
y = Median_Income,
style = "jenks",
dim = 3, dig_lab = c(3,3),
si_levels = TRUE)
# create map
df_rent_income_bivar_map <-
ggplot() +
geom_sf(data = df_rent_income_bivar,
mapping = aes(fill = bi_class),
## remove borders
color = NA,
size = 0.1, show.legend = FALSE) +
bi_scale_fill(pal = "DkCyan", dim = 3) +
bi_theme() +
theme(plot.title =element_text(size=14, face='bold'),
plot.subtitle = element_text(size=11, face='bold'),
plot.caption = element_text(size=10)) +
labs(
title = paste0(county_var, ", ", state_var),
subtitle = paste0("Source: Census ACS (2016-2020)"),
caption = paste0("Excludes Census Tracts of Population Density\n",
"< 1000 residents / sq. mile = ",
eliminated_low_density_popC, ".",
"\n, Packages used include tidycensus and biscale.")
)
# create legend, notice use of labels1 from bi_class_breaks() earlier
df_rent_income_bivar_legend <- bi_legend(pal = "DkCyan",
breaks = labels1,
xlab = "Median Rent ($)",
ylab = "Median Income ($)",
size = 8)
# combine map with legend
# df_rent_income_bivar_map2 <- ggdraw() +
# draw_plot(df_rent_income_bivar_map,
# x = 0.5, y = 0, width = 0.5, height = 1.1, scale = 0.8) +
# draw_plot(df_rent_income_bivar_legend,
# x = 0.3, y = 0.30, width = 0.3, height = 0.4, scale = 1.3)
df_rent_income_bivar_map2 <-
grid.arrange(df_rent_income_bivar_legend,
df_rent_income_bivar_map,
ncol = 2,
widths = unit.c(unit(0.45, "npc"), unit(0.55, "npc")))
save_plot(paste0(save_folder,
"Income_Rent_",
## So values like Miami-Dade County only keep the letters
gsub("[^a-zA-Z]+", "",
## and only first six characters in filename
substr(county_var,1,min(6, nchar(county_var)))),
"_County_", state_var, ".png"),
plot = df_rent_income_bivar_map2)

Notes

Note for self: had to create Github Personal Access Token through R Studio using https://carpentries.github.io/sandpaper-docs/github-pat.html.

Thank you to Robert Monfrera and Eric Delmelle for the map formatting suggestions.

Early comments about this work at https://stackoverflow.com/questions/72591794/create-a-bivariate-choropleth-map-using-tidycensus.

Future work will explore dynamically adjusting the excluded area so less residents excluded.

Advertisement

3 thoughts on “Bivariate Choropleths with biscale package – new SI prefix in legend

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s