This notebook provides examples of:
st_transform()
,st_as_sf()
,st_read()
,st_area()
and the measurements
package,shape
and size
arguments with ggplot2
.This notebook requires a number of packages:
# tidyverse packages
library(dplyr) # data cleaning
library(ggplot2) # static mapping
library(readr) # read/write tabular data
# spatial packages
library(mapview) # preview spatial data
library(sf) # spatial data tools
library(tigris) # access TIGER/line data
# other packages
library(here) # file path management
library(measurements) # unit conversion
library(naniar) # missing data
This notebook requires one set of data:
sluPlaces <- read_csv(here("data", "example-data", "sluPlaces.csv"))
Rows: 6 Columns: 4
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (1): name
dbl (3): id, lng, lat
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Make sure your x
and y
coordinate variables are numeric or double:
str(sluPlaces)
spec_tbl_df [6 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ id : num [1:6] 1 2 3 4 5 6
$ name: chr [1:6] "Morrissey Hall" "Starbucks" "Simon Rec" "Pius Library" ...
$ lng : num [1:6] -90.2 -90.2 -90.2 -90.2 -90.2 ...
$ lat : num [1:6] 38.6 38.6 38.6 38.6 38.6 ...
- attr(*, "spec")=
.. cols(
.. id = col_double(),
.. name = col_character(),
.. lng = col_double(),
.. lat = col_double()
.. )
- attr(*, "problems")=<externalptr>
If they are not, use mutate()
with as.numeric()
to convert them.
Next, we need to make sure there are no missing data in our coordinate columns. This will cause errors when we go to project these data.
miss_var_summary(sluPlaces)
Great! We have no missing data in lng
and lat
. If we did, we would need to use filter()
from dplyr
to remove those missing values. Also be aware of coordinates that are set to 0,0. These can sometimes be used to identify missing data as well.
sluPlaces %>%
filter(lng == 0 | lat == 0)
Again, excellent, no missing data!
Identifying the coordinate system that lng
and lat
represent can be challenging:
-90.2
in the longitude column, this immediately suggests to me that we have decimal degrees data here. If the data originate in the United States, they’re typically encoded using NAD 1983. If they are international data, they’ll use WGS 1984.x,y
pairs for State Plane meters do not correspond to State Plane feet. So, we need to experiment here. Many local municipalities rely on State Plane for mapping out of tradition, and it’s therefore common to run into both the feet and meters versions when working with local data.Once we have candidates for projecting our points, we need to identify the CRS/EPSG values (or, alternatively, the Proj4 strings) that correspond to our coordinate system candidate(s). For this, we’ll use websites like EPSG.io and Spatial Reference.
First, we want to convert these data to from a tibble
to an sf
object with st_as_sf()
. We use the lng
variable as our x
variable and lat
as our y
variable, and use 4269
for our crs
argument since these data are in decimal degrees and this corresponds to the likely coordinate system we identified above:
sluPlaces_sf <- st_as_sf(sluPlaces, coords = c("lng", "lat"), crs = 4269)
Next, we want to confirm that this worked:
mapview(sluPlaces_sf)
Excellent!
We’ve already used st_transform()
, but now can do so with purpose. For example, to convert our data to State Plane (meters). We’ll use the data based on the 2007 update to NAD 1983:
sluPlaces_statePlane <- st_transform(sluPlaces_sf, crs = 3601)
If we need our data in feet, there are also options. However, these are ESRI products that are not included in the sf
package. How do we know? We can use the EPSG value 102696
.
st_transform(sluPlaces_sf, crs = 102696)
Note that error includes this language - GDAL Error 1: PROJ: proj_create_from_database: crs not found
. Even so, we can still use the coordinate system by specifying the Proj4
string value instead of the CRS number:
sluPlaces_statePlane_ft <- st_transform(sluPlaces_sf, crs = "ESRI:102696")
Sometimes this doesn’t work, and so it is also useful to know how to apply Proj4
strings as well.
sluPlaces_statePlane_ft <- st_transform(sluPlaces_sf, crs = "+proj=tmerc +lat_0=35.83333333333334 +lon_0=-90.5 +k=0.9999333333333333 +x_0=250000 +y_0=0 +ellps=GRS80 +datum=NAD83 +to_meter=0.3048006096012192 +no_defs ")
This should give us correctly projected data. Using this trick with the Proj4
strings also works, by the way, with st_as_sf()
as well.
Finally, we’ll write our data:
st_write(sluPlaces_statePlane, here("data", "example-data", "clean-data", "sluPlaces.shp"), delete_dsn = TRUE)
Deleting source `/Users/prenercg/GitHub/slu-soc5650/module-3-projections/data/example-data/clean-data/sluPlaces.shp' using driver `ESRI Shapefile'
Writing layer `sluPlaces' to data source
`/Users/prenercg/GitHub/slu-soc5650/module-3-projections/data/example-data/clean-data/sluPlaces.shp' using driver `ESRI Shapefile'
Writing 6 features with 2 fields and geometry type Point.
The st_write()
function identifies the file type from what you include at the end of the here()
statement. If I am working solely in R
, I tend to use .geojson
because:
However, if you are going to be working in the ESRI ecosystem, saving data as shapefiles is suggested. Note that #1 above needs to be addressed - keep variable names short (8 characters or less) and convert big numbers to strings or remove them completely before saving.
One final task we have when working with projections is to use them as the basis for making calculations. Often, this entails calculating area so that we can use it for normalizing our data. Sometimes our data come with measurements, but we don’t know what those units are. Other times, we don’t have an area measure at all. Consider these data from tigris
:
moCounties <- counties(state = 29) %>%
select(GEOID, NAMELSAD, ALAND, AWATER)
|
| | 0%
|
|= | 0%
|
|= | 1%
|
|== | 1%
|
|== | 2%
|
|=== | 2%
|
|==== | 2%
|
|==== | 3%
|
|===== | 3%
|
|===== | 4%
|
|====== | 4%
|
|======= | 4%
|
|======= | 5%
|
|======== | 5%
|
|======== | 6%
|
|========= | 6%
|
|========= | 7%
|
|========== | 7%
|
|=========== | 7%
|
|=========== | 8%
|
|============ | 8%
|
|=============== | 10%
|
|================= | 11%
|
|=================== | 13%
|
|===================== | 14%
|
|===================== | 15%
|
|====================== | 15%
|
|======================= | 15%
|
|======================= | 16%
|
|======================== | 16%
|
|======================== | 17%
|
|========================= | 17%
|
|========================== | 17%
|
|========================== | 18%
|
|============================= | 20%
|
|================================ | 22%
|
|=================================== | 24%
|
|====================================== | 26%
|
|========================================= | 28%
|
|============================================ | 30%
|
|=============================================== | 32%
|
|================================================== | 34%
|
|===================================================== | 36%
|
|====================================================== | 37%
|
|========================================================= | 39%
|
|============================================================ | 41%
|
|============================================================= | 41%
|
|============================================================== | 42%
|
|=============================================================== | 43%
|
|================================================================ | 44%
|
|=================================================================== | 46%
|
|====================================================================== | 48%
|
|========================================================================= | 50%
|
|============================================================================ | 52%
|
|=============================================================================== | 54%
|
|================================================================================ | 55%
|
|=================================================================================== | 57%
|
|====================================================================================== | 59%
|
|========================================================================================= | 61%
|
|========================================================================================== | 61%
|
|========================================================================================== | 62%
|
|=========================================================================================== | 62%
|
|=========================================================================================== | 63%
|
|============================================================================================ | 63%
|
|=============================================================================================== | 65%
|
|================================================================================================== | 67%
|
|==================================================================================================== | 68%
|
|==================================================================================================== | 69%
|
|===================================================================================================== | 69%
|
|====================================================================================================== | 70%
|
|======================================================================================================= | 70%
|
|======================================================================================================= | 71%
|
|======================================================================================================== | 71%
|
|========================================================================================================= | 72%
|
|========================================================================================================== | 72%
|
|========================================================================================================== | 73%
|
|=========================================================================================================== | 73%
|
|=========================================================================================================== | 74%
|
|============================================================================================================ | 74%
|
|=============================================================================================================== | 76%
|
|================================================================================================================== | 78%
|
|==================================================================================================================== | 80%
|
|===================================================================================================================== | 80%
|
|====================================================================================================================== | 80%
|
|====================================================================================================================== | 81%
|
|======================================================================================================================= | 81%
|
|======================================================================================================================= | 82%
|
|======================================================================================================================== | 82%
|
|=========================================================================================================================== | 84%
|
|============================================================================================================================== | 86%
|
|================================================================================================================================= | 88%
|
|================================================================================================================================== | 89%
|
|=================================================================================================================================== | 90%
|
|==================================================================================================================================== | 90%
|
|==================================================================================================================================== | 91%
|
|===================================================================================================================================== | 91%
|
|======================================================================================================================================== | 93%
|
|=========================================================================================================================================== | 95%
|
|============================================================================================================================================= | 97%
|
|=============================================================================================================================================== | 98%
|
|================================================================================================================================================ | 98%
|
|================================================================================================================================================ | 99%
|
|==================================================================================================================================================| 100%
There are both ALAND
and AWATER
columns, but it isn’t immediately clear what their units are (they are in meters, FYI). If we want the total area of these units, we can combine a few functions from sf
, dplyr
, and measurements
to achieve a few different outcomes. First, we’ll re-calculate the area based on a projected coordinate system. We need to move from the current coordinate system to a projected coordinate system. To get a sense of our starting place, we’ll use st_crs()
:
st_crs(moCounties)
Coordinate Reference System:
User input: NAD83
wkt:
GEOGCRS["NAD83",
DATUM["North American Datum 1983",
ELLIPSOID["GRS 1980",6378137,298.257222101,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["latitude",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["longitude",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4269]]
This confirms that we are using a geographic coordinate system (this is typical for TIGER/Line data). We’ll switch to Albers Equal Area Conic for the contiguous United States:
moCounties <- st_transform(moCounties, crs = "ESRI:102003")
# moCounties <- st_transform(moCounties, crs = "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=37.5 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs")
I’ve also included the Proj4
string, but it is commented out. Notice how +units=2
appears in the Proj4
string. This means are data are in meters. If we wanted to recalculate our area units meters, we can use the st_area()
function and the geometry
column to do so:
moCounties %>%
select(-c("ALAND", "AWATER")) %>%
mutate(sq_m = as.numeric(st_area(geometry)), .after = NAMELSAD) %>%
mutate(sq_km = conv_unit(sq_m, from = "m2", to = "km2"), .after = sq_m) -> moCounties
We can do the same thing if we want to convert to square miles:
moCounties <- mutate(moCounties, sq_mi = conv_unit(sq_m, from = "m2", to = "mi2"), .after = sq_km)
Writing the sq_m
column to shapefile might be tricky, so its best to do these conversions and then remove the source column before saving!
One thing I wanted to share quickly this week is how we can approach displaying points with ggplot2
, since we happen to have some point data to work with. We’ll use the shape
and size
arguments to make simple adjustments:
ggplot() +
geom_sf(data = sluPlaces_sf, shape = 18, size = 4)
Some point symbols - those with values 21 through 25 - can be customized with fills and colors as well:
ggplot() +
geom_sf(data = sluPlaces_sf, shape = 22, size = 6, fill = "#9d4a9d", color = "#5d92e5")