knitr::opts_chunk$set(cache = FALSE)
This notebook illustrates data access through both tigris
and tidycensus
as well as joins using dplyr
.
This notebook requires the following packages:
# tidyverse packages
library(dplyr) # data wrangling
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
# spatial packages
library(mapview) # preview geometric data
Registered S3 method overwritten by 'htmlwidgets':
method from
print.htmlwidget tools:rstudio
library(sf) # spatial tools
Linking to GEOS 3.8.1, GDAL 3.2.1, PROJ 7.2.1
library(tidycensus) # demographic data
library(tigris) # tiger/line data
To enable
caching of data, set `options(tigris_use_cache = TRUE)` in your R script or .Rprofile.
Attaching package: ‘tigris’
The following object is masked from ‘package:tidycensus’:
fips_codes
# other packages
library(here) # file path management
here() starts at /Users/prenercg/GitHub/slu-soc5650/module-2-combine-sources
Before using tidycensus
, you need to install a census API key. Use the syntax below, copied into your console, to install the key you received via email.
census_api_key("KEY", install = TRUE)
This is not a code chunk you will need in each notebook. As long as install = TRUE
, you will only have to do this once!
To get a preview of variables available in the get_decennial()
function, we can use the load_variables()
function:
census <- load_variables(year = 2000, dataset = "sf1")
I find it useful to assign the output of this function to an object so that I can search through it. Try searching for the variable P0010001
, the total population of a geographic unit, in the census
object.
To download data, we can use use the get_decennial()
function to access, for example, population by state in 2000:
popStates <- get_decennial(geography = "state", year = 2000, variable = "P001001")
A full list of the geographies available in tidycensus
can be found here.
Most variables in the decennial census are actually a part of a table. There are individual variables, for example, for race:
census %>%
filter(concept == "P3. RACE [8]")
We rarely want to download these one at a time. Instead, we want to download them at one time into a single data frame. The table number for these data is P003
- we take the first four characters from the name
variable.
cityRace00 <- get_decennial(geography = "tract", year = 2000, state = 29,
county = "510", table = "P003", output = "wide")
We’ve used the FIPS codes for both Missouri (29
) and St. Louis City (29510
) here - you can find a full list of Missouri counties here.
The tidycensus
package also includes tools for downloading the geometries for these data as well. For instance, we can add geometric data to our previous call for City of St. Louis tract-level data on race by adding the geometry = TRUE
argument:
## download
cityRace00 <- get_decennial(geography = "tract", year = 2000, state = 29,
county = "510", table = "P003", output = "wide",
geometry = TRUE)
Getting data from the 2000 decennial Census
Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
Loading SF1 variables for 2000 from table P003. To cache this dataset for faster access to Census tables in the future, run this function with `cache_table = TRUE`. You only need to do this once per Census dataset.
Using Census Summary File 1
Using Census Summary File 1
|
| | 0%
|
|== | 1%
|
|=== | 2%
|
|==== | 3%
|
|===== | 3%
|
|===== | 4%
|
|====== | 4%
|
|======= | 5%
|
|======== | 5%
|
|========= | 6%
|
|========== | 7%
|
|=========== | 7%
|
|=========== | 8%
|
|============ | 8%
|
|============= | 9%
|
|============== | 9%
|
|============== | 10%
|
|=============== | 10%
|
|================ | 11%
|
|================= | 11%
|
|================= | 12%
|
|================== | 12%
|
|================== | 13%
|
|=================== | 13%
|
|==================== | 13%
|
|==================== | 14%
|
|====================== | 15%
|
|======================= | 16%
|
|======================== | 16%
|
|========================== | 17%
|
|=========================== | 19%
|
|============================= | 20%
|
|=============================== | 21%
|
|================================= | 22%
|
|================================== | 23%
|
|==================================== | 25%
|
|====================================== | 26%
|
|======================================== | 27%
|
|========================================== | 28%
|
|============================================ | 30%
|
|============================================== | 31%
|
|=============================================== | 32%
|
|================================================ | 32%
|
|================================================= | 33%
|
|=================================================== | 35%
|
|==================================================== | 36%
|
|======================================================= | 37%
|
|======================================================== | 38%
|
|========================================================== | 40%
|
|============================================================ | 41%
|
|=============================================================== | 43%
|
|=================================================================== | 45%
|
|===================================================================== | 47%
|
|====================================================================== | 48%
|
|========================================================================= | 49%
|
|========================================================================== | 50%
|
|============================================================================== | 53%
|
|================================================================================= | 55%
|
|==================================================================================== | 57%
|
|===================================================================================== | 58%
|
|======================================================================================= | 59%
|
|=========================================================================================== | 62%
|
|============================================================================================ | 63%
|
|================================================================================================ | 65%
|
|================================================================================================== | 67%
|
|=================================================================================================== | 67%
|
|======================================================================================================= | 70%
|
|========================================================================================================== | 72%
|
|============================================================================================================== | 75%
|
|================================================================================================================= | 77%
|
|==================================================================================================================== | 79%
|
|===================================================================================================================== | 80%
|
|======================================================================================================================= | 81%
|
|========================================================================================================================= | 82%
|
|=========================================================================================================================== | 84%
|
|============================================================================================================================ | 84%
|
|================================================================================================================================ | 87%
|
|================================================================================================================================== | 89%
|
|===================================================================================================================================== | 90%
|
|========================================================================================================================================= | 93%
|
|============================================================================================================================================== | 97%
|
|===================================================================================================================================================| 100%
## preview
mapview(cityRace00, zcol = "P003005")
Notice how I used the zcol
argument for mapview()
to preview a specific set of data as a thematic layer on the map! These data are not normalized, but we do get a quick preview of the distribution of Asian residents in St. Louis City.
To get a preview of variables available in the get_acs()
function, we can use the load_variables()
function again. We’ll use "acs5"
for our dataset and, for this example, we’ll pull from the most recent 2019 ACS year:
census <- load_variables(year = 2019, dataset = "acs5")
Try searching for the table B19013
, the median household income table.
We’ll illustrate get_acs()
by using the data in table B19019
. First, we’ll download these data as a full table for all counties in Missouri:
## download
countyIncome <- get_acs(geography = "county", year = 2019, state = 29,
table = "B19019", output = "wide", geometry = TRUE)
Getting data from the 2015-2019 5-year ACS
Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
Loading ACS5 variables for 2019 from table B19019. To cache this dataset for faster access to ACS tables in the future, run this function with `cache_table = TRUE`. You only need to do this once per ACS dataset.
|
| | 0%
|
|= | 1%
|
|=== | 2%
|
|==== | 2%
|
|===== | 3%
|
|====== | 4%
|
|======= | 5%
|
|========= | 6%
|
|========== | 7%
|
|=========== | 7%
|
|============ | 8%
|
|============= | 9%
|
|============== | 9%
|
|============== | 10%
|
|=============== | 10%
|
|=============== | 11%
|
|================= | 12%
|
|================== | 12%
|
|=================== | 13%
|
|==================== | 13%
|
|==================== | 14%
|
|====================== | 15%
|
|======================= | 16%
|
|======================== | 16%
|
|========================= | 17%
|
|========================== | 18%
|
|=========================== | 18%
|
|============================ | 19%
|
|============================= | 20%
|
|============================== | 20%
|
|=============================== | 21%
|
|================================= | 23%
|
|================================== | 23%
|
|==================================== | 25%
|
|===================================== | 25%
|
|====================================== | 26%
|
|======================================= | 26%
|
|========================================= | 28%
|
|=========================================== | 29%
|
|============================================== | 31%
|
|================================================= | 33%
|
|=================================================== | 35%
|
|==================================================== | 35%
|
|===================================================== | 36%
|
|======================================================= | 38%
|
|========================================================= | 39%
|
|============================================================ | 41%
|
|============================================================= | 42%
|
|=============================================================== | 43%
|
|=================================================================== | 45%
|
|===================================================================== | 47%
|
|======================================================================= | 48%
|
|========================================================================= | 49%
|
|========================================================================== | 51%
|
|=========================================================================== | 51%
|
|============================================================================= | 52%
|
|============================================================================== | 53%
|
|=============================================================================== | 53%
|
|=============================================================================== | 54%
|
|================================================================================ | 54%
|
|================================================================================= | 55%
|
|=================================================================================== | 56%
|
|==================================================================================== | 57%
|
|====================================================================================== | 59%
|
|======================================================================================= | 59%
|
|========================================================================================= | 61%
|
|========================================================================================== | 61%
|
|=========================================================================================== | 62%
|
|============================================================================================ | 62%
|
|============================================================================================ | 63%
|
|============================================================================================= | 63%
|
|=============================================================================================== | 65%
|
|================================================================================================= | 66%
|
|=================================================================================================== | 68%
|
|===================================================================================================== | 69%
|
|======================================================================================================= | 70%
|
|========================================================================================================= | 71%
|
|=========================================================================================================== | 73%
|
|============================================================================================================= | 74%
|
|============================================================================================================== | 75%
|
|================================================================================================================= | 77%
|
|==================================================================================================================== | 79%
|
|====================================================================================================================== | 81%
|
|======================================================================================================================== | 81%
|
|=========================================================================================================================== | 84%
|
|============================================================================================================================= | 85%
|
|=============================================================================================================================== | 87%
|
|================================================================================================================================= | 88%
|
|=================================================================================================================================== | 89%
|
|===================================================================================================================================== | 90%
|
|======================================================================================================================================= | 92%
|
|========================================================================================================================================== | 94%
|
|============================================================================================================================================= | 96%
|
|================================================================================================================================================ | 98%
|
|================================================================================================================================================== | 99%
|
|===================================================================================================================================================| 100%
## preview
mapview(countyIncome, zcol = "B19019_001E")
Notice how we needed to specify _001E
for zcol
. That references the specific variable we want to map - variable 1 in the table’s estimate (or E
). The M
values refer to the margin of the error - we expect this estimate to be off by some amount within +/- this value.
We can also download a specific column, like the median income for one-person households (B19019_002
):
## download
countyIncome <- get_acs(geography = "county", year = 2019, state = 29,
variables = "B19019_002", output = "wide",
geometry = TRUE)
Getting data from the 2015-2019 5-year ACS
Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
## preview
mapview(countyIncome, zcol = "B19019_002E")
Perhaps we have a range of data that we want to include. For this example, we’ll download data on median income and the proportion of women in tracts in Boone County, Missouri. We’ll download the income data with geometry = TRUE
and the sex data with geometry = FALSE
:
## download
booneIncome <- get_acs(geography = "tract", year = 2019, state = 29,
county = "019", variables = "B19019_001",
output = "wide", geometry = TRUE) %>%
rename(median_income = B19019_001E) %>%
select(GEOID, median_income)
Getting data from the 2015-2019 5-year ACS
Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
|
| | 0%
|
|======================== | 17%
|
|===================================================================================================================================================| 100%
## download
booneSex <- get_acs(geography = "tract", year = 2019, state = 29,
county = "019", variables = c("B01001_001", "B01001_026"),
output = "wide") %>%
mutate(pct_women = B01001_026E/B01001_001E*100) %>%
select(GEOID, pct_women)
Getting data from the 2015-2019 5-year ACS
To combine these data, we’ll use left_join()
from dplyr
. Our sf
object should always be the first object in the join (the x
data) and our non-sf data should be the second data (the y
data):
boone <- left_join(booneIncome, booneSex, by = "GEOID")
Three common issues arise:
by = c("GEOID" = "geoid")
booneIncome <- mutate(GEOID = as.numeric(GEOID))
sf
objects: st_geometry(booneSEX) <- NULL
To get data from the TIGER/line database, we can use the tigris
package. You can see a full list of the data available here.
We can download a generalized version, which smooths out state boundaries so that the overall image is both smaller in disk size and (sometimes) easier to read. This is particularly helpful if you are making small scale maps of the entire United States. We’ll get these data at the “20m” resolution using the states()
function:
states <- states(cb = TRUE, resolution = "20m")
|
| | 0%
|
|================================================================================================================================ | 87%
|
|===================================================================================================================================================| 100%
Now, we’ll get more detailed data - all of the county boundaries for Missouri. We’ll use the counties()
function using a slightly less generalized resolution, “5m”:
moCounties <- counties(cb = TRUE, resolution = "5m")
|
| | 0%
|
|== | 1%
|
|========================== | 18%
|
|============================================== | 32%
|
|============================================================== | 42%
|
|======================================================================== | 49%
|
|=========================================================================== | 51%
|
|=================================================================================================== | 67%
|
|==================================================================================================== | 68%
|
|===================================================================================================================================================| 100%
Now, we’ll get even more detailed data - all of the tract boundaries for St. Charles County, Missouri. We’ll use the tracts()
function with cb = FALSE
by default:
stCharlesTracts <- tracts(state = 29, county = 183)
|
| | 0%
|
|= | 1%
|
|========== | 7%
|
|============== | 9%
|
|=============== | 10%
|
|================ | 11%
|
|================= | 11%
|
|=================== | 13%
|
|===================== | 14%
|
|====================== | 15%
|
|=========================== | 19%
|
|=============================== | 21%
|
|================================= | 22%
|
|================================= | 23%
|
|===================================== | 25%
|
|====================================== | 26%
|
|=========================================== | 29%
|
|=============================================== | 32%
|
|================================================ | 32%
|
|================================================ | 33%
|
|================================================== | 34%
|
|====================================================== | 36%
|
|======================================================== | 38%
|
|========================================================= | 39%
|
|============================================================== | 42%
|
|=============================================================== | 43%
|
|================================================================== | 45%
|
|==================================================================== | 46%
|
|====================================================================== | 48%
|
|========================================================================= | 49%
|
|============================================================================= | 52%
|
|============================================================================= | 53%
|
|================================================================================ | 54%
|
|================================================================================ | 55%
|
|================================================================================= | 55%
|
|=================================================================================== | 56%
|
|=================================================================================== | 57%
|
|==================================================================================== | 57%
|
|===================================================================================== | 58%
|
|======================================================================================== | 60%
|
|========================================================================================= | 60%
|
|========================================================================================= | 61%
|
|=========================================================================================== | 62%
|
|============================================================================================ | 62%
|
|============================================================================================== | 64%
|
|================================================================================================= | 66%
|
|=================================================================================================== | 68%
|
|====================================================================================================== | 70%
|
|========================================================================================================= | 72%
|
|=========================================================================================================== | 73%
|
|============================================================================================================= | 74%
|
|================================================================================================================= | 77%
|
|==================================================================================================================== | 79%
|
|======================================================================================================================= | 81%
|
|========================================================================================================================== | 83%
|
|============================================================================================================================= | 85%
|
|=============================================================================================================================== | 86%
|
|================================================================================================================================ | 87%
|
|=================================================================================================================================== | 89%
|
|===================================================================================================================================== | 91%
|
|========================================================================================================================================= | 93%
|
|============================================================================================================================================ | 95%
|
|================================================================================================================================================= | 99%
|
|===================================================================================================================================================| 100%