Learning objectives

  1. Download species locations & map layers
  2. Process spatial data
  3. Produce a publication ready map

Introduction

For many research papers, reports or popular science articles it is increasingly valuable to provide a spatial context to your findings. Visual aids, such as maps, can be excellent communication tools when sharing your findings with colleagues or the public. It involves communication, intuition and a bit of imagination. However, if used badly, they can lose their meaning or provide straight up disinformation.

To create your own maps, it is important to understand some of the core types of data that you may be using. Spatial data comes in two formats: vectors and rasters.

Vector data

Vectors store information using either points, lines or polygons. You may be familiar with vectors in the format shapefiles (.shp), kmz/kml or gpx. We will mostly work with shapefiles. Vector files are useful for delineating boundaries, such as those for countries, protected areas. They are also often used for displaying lines or points, such as roads or species locations, respectively.

Different types of vector features supported by `sf` (or simple feature) package. Credit: Lovelace, Nowosad, Muenchow 2022: Geocomputation with R. Chapter 2

Different types of vector features supported by sf (or simple feature) package. Credit: Lovelace, Nowosad, Muenchow 2022: Geocomputation with R. Chapter 2

There are several different R packages for using vector data, however in this tutorial, we will focus on the sf package for handling and displaying vector data. sf is useful for handling spatial data as the vector objects can be treated like data frames in most functions and in can be combined with the pipe operator %>% and works well with the tidyverse group of R packages. This includes integration into the ggplot2 package, which will help us with making maps later.

Raster data

Rasters represent data in a grid of ‘cells’, which are typically referred to as ‘pixels’. Generally, these cells/pixels have a constant size or spatial resolution (e.g. mm/m/km). Rasters are formatted as a matrix, with rows and columns. Each raster will also come with a ‘header’ or metadata, which includes information on the coordinate reference system (which we discuss later), an extent and an origin.

The values inside each raster cell can represent either continuous or categorical data. Continuous data may include values such as temperature, elevation or spectral data. Categorical data may include land cover types or discrete categories of soil classes.

Different types of raster data. Credit: Lovelace, Nowosad, Muenchow 2022: Geocomputation with R. Chapter 2

Different types of raster data. Credit: Lovelace, Nowosad, Muenchow 2022: Geocomputation with R. Chapter 2

Much like vector data, there are several R packages available for handling raster data. In this course, we will focus on the terra package to help streamline your learning process. terra is the natural successor to the raster package. The terra package is faster and provides update functionality to the raster package.

Coordinate Reference Systems

The last topic to introduce you to before getting coding is Coordinate Reference System or CRS. Both raster and vector data share this property. It defines our spatial objects are represented on the surface of the earth.

CRSs can be either GEOGRAPHIC or PROJECTED.

Geographic CRSs specify a location on the surface of the earth using LONGITUDE and LATITUDE. This means that distances are not measured in meters, as the earth is measured as 3D-surface: either a spherical or ellipsoidal surface.

Projected CRSs represent a projected surface of the 3D earth onto a 2D. These allow distances to be measured in meters. Every projection will have some kind of distortion. This distortion will impact the area, direction, distance or shape of the projection. This means it is important to keep in mind when doing area or distance based calculations! The best tip is to make sure that your projection matches the space that you are working in.

Watch this video to get a better visual idea of the difference between a Geographic and Projected CRS:

To find a list of all available projections use this function: sf_proj_info(type = "proj") and more information on each one can be seen at Geo Projections. The webpage thetruesize, is also a fun and useful tool to explore how the Mercator projection distorts areas.

Tutorial

That covers the basics. For a fuller explanation of these basics, please visit the Geocomputation with R notebook. In this tutorial, you will become more familiar with vector based data for making a map of species localities. In the following tutorials, you will get some more experience in handling raster based data. Let’s get started with coding. The first step in our coding workflow is to install and load the required packages:

# install.packages('rgbif')
# install.packages('tidyverse')
# install.packages('CoordinateCleaner')
# install.packages('rnaturalearth')
# install.packages('sf')
# install.packages('mapview')
# install.packages('ggspatial')
# install.packages('patchwork')

library(tidyverse)
library(rgbif)
library(CoordinateCleaner)
library(rnaturalearth)
library(sf)
library(mapview)
library(ggspatial)
library(patchwork)

# # Alternative method to load many libraries
# if(!require("pacman")) install.packages("pacman")
# pacman::p_load(tidyverse, rgbif, CoordinateCleaner, rnaturalearth, sf, mapview, ggspatial, patchwork)

Downloading & cleaning species data from GBIF

We will download the data from the Global Biodiversity Information Facility (GBIF). This is a massive online database of museum, herbarium, large database and in person observations. It has recently been integrated with iNaturalist (a citizen science based observation platform), meaning it has both historical and contemporary observations.

We will be working on Protea roupelliae. A summer rainfall Protea species, which occurs extensively across eastern South Africa, Lesotho and eSwatini. There are two subspecies, but we will treat these as the same thing in this tutorial.

# Name your species
myspecies <- c("protea roupelliae")

# download GBIF occurrence data for this species; this takes time if there are many data points!
gbif_download <- occ_data(scientificName = myspecies, hasCoordinate = TRUE, limit = 1000)

gbif_data <- gbif_download$data
names(gbif_data)[1:10]
##  [1] "key"               "scientificName"    "decimalLatitude"  
##  [4] "decimalLongitude"  "issues"            "datasetKey"       
##  [7] "publishingOrgKey"  "installationKey"   "publishingCountry"
## [10] "protocol"

The GBIF data includes many historical observations, which were taken before the use of GPS. Because of this, the locations provided may have errors, lack precision or may not be at the required spatial scale for downstream analysis. Please read this text for more information on why we need to clean GBIF location data.

We use the CoordinateCleaner package to use some useful functions that allow for easy cleaning of the locality data. Each function gives an output telling us how many observations were dropped.

gbif_data %>%
  filter(year >= 1900) %>%
  filter(!coordinateUncertaintyInMeters %in% c(301,3036,999,9999)) %>% # known inaccurate default values
  filter(!decimalLatitude == 0 | !decimalLongitude == 0) %>%
  cc_cen(lat = 'decimalLatitude', lon = 'decimalLongitude', buffer = 2000) %>% # remove country centroids within 2km 
  cc_cap(lat = 'decimalLatitude', lon = 'decimalLongitude', buffer = 2000) %>% # remove capitals centroids within 2km
  cc_inst(lat = 'decimalLatitude', lon = 'decimalLongitude', buffer = 2000) %>% # remove zoo and herbaria within 2km 
  cc_sea(lat = 'decimalLatitude', lon = 'decimalLongitude') %>% # remove from ocean 
  distinct(decimalLongitude,decimalLatitude,speciesKey,datasetKey, .keep_all = TRUE) -> p_roup
## Testing sea coordinates
## Testing biodiversity institutions
## Testing country capitals
## Testing country centroids
## Removed 0 records.
## Removed 0 records.
## Removed 26 records.
## OGR data source with driver: ESRI Shapefile 
## Source: "/private/var/folders/x4/zllw3fdd4dz3y7trfk3ff4cm0000gn/T/RtmpOfIFpa", layer: "ne_110m_land"
## with 127 features
## It has 3 fields
## Removed 29 records.

Downloading country boundaries

Next we use the rnaturalearth package to download country boundaries. We start with all world countries and then plot this using ggplot2 and use the default GDP dataset for visualisation purposes.

# using the maps package, download and convert the download to an sf object
world <- ne_countries(scale = 'medium', returnclass = 'sf')
names(world)
##  [1] "scalerank"  "featurecla" "labelrank"  "sovereignt" "sov_a3"    
##  [6] "adm0_dif"   "level"      "type"       "admin"      "adm0_a3"   
## [11] "geou_dif"   "geounit"    "gu_a3"      "su_dif"     "subunit"   
## [16] "su_a3"      "brk_diff"   "name"       "name_long"  "brk_a3"    
## [21] "brk_name"   "brk_group"  "abbrev"     "postal"     "formal_en" 
## [26] "formal_fr"  "note_adm0"  "note_brk"   "name_sort"  "name_alt"  
## [31] "mapcolor7"  "mapcolor8"  "mapcolor9"  "mapcolor13" "pop_est"   
## [36] "gdp_md_est" "pop_year"   "lastcensus" "gdp_year"   "economy"   
## [41] "income_grp" "wikipedia"  "fips_10"    "iso_a2"     "iso_a3"    
## [46] "iso_n3"     "un_a3"      "wb_a2"      "wb_a3"      "woe_id"    
## [51] "adm0_a3_is" "adm0_a3_us" "adm0_a3_un" "adm0_a3_wb" "continent" 
## [56] "region_un"  "subregion"  "region_wb"  "name_len"   "long_len"  
## [61] "abbrev_len" "tiny"       "homepart"   "geometry"
# We can easily plot the world map layer using ggplot
ggplot() +
  geom_sf(data = world, aes(fill = gdp_md_est)) + # use the fill option within the aesthetics parameter
  scale_fill_viridis_c(trans = 'log', name = 'GDP (log)') # chose a palette for the colour gradient, transform the variable and add a name for the legend

Importantly, to convert a 3D space (the earth as a sphere) into a 2D image, we need to ‘project’ the world. There are many different attempts at accurate projections, but they normally need to compromise between area, shape and distance. The projection is important to keep in mind if you are doing any area or distance calculations. Generally, you will want to use a local or regional projection for this. Read some additional resources on projections here, here and here. Let’s try the Lambert Azimuthal Equal-Area projection:

# We can also change the projection of the coordinate reference system (CRS)
ggplot() +
  geom_sf(data = world, aes(fill = gdp_md_est)) +
  scale_fill_viridis_c(trans = 'log', name = 'GDP (log)') +
  coord_sf(crs = 3035)

Refining our map & plotting our species localities

Let’s move our focus back to our species map for Protea roupelliae. We now want to filter down the countries included in a country boundary dataset to only include those within our region of interest.

# Create a vector of names for countries we want to use (= southern African countries) to filter to our area of focus
countries <- c('South Africa', 'Lesotho', 'Swaziland', 'Namibia', 'Botswana', 'Zimbabwe', 'Mozambique')

# Now filter down the list of all countries to only include southern African countries
world %>% filter(admin %in% countries) -> sern_a

# Plot this using geom_sf from ggplot()
ggplot() +
  geom_sf(data = sern_a) 

# This looks a bit strange, it includes Marion Island in the sub-antarctic, which is part of South Africa!
# Remove this by setting limits on the x and y axes
base_plot1 <- ggplot() + # save the output of the map to an object
  geom_sf(data = sern_a) +
  scale_x_continuous(limits = c(16, 33)) +
  scale_y_continuous(limits = c(-35, -22))
base_plot1 # print this object, we can now reuse 'base_plot1' and add to it later

We can now plot add our species locality points to the map! First clean up the GBIF dataset a bit further and then convert the data.frame to a simple feature or sf object.

# Now we're focused in on South Africa, let's add the GBIF points to the map
# First clean the dataset by selecting only the columns we need, rename the variables
p_roup %>% select(key, year, basisOfRecord, decimalLongitude, decimalLatitude) %>% rename(lon = decimalLongitude, lat = decimalLatitude) -> pr_clean

# Now we want to convert & project our data to the correct Coordinate Reference System
pr_no_crs <- st_as_sf(pr_clean, coords = c('lon', 'lat'))
st_crs(pr_no_crs) # as we converted these directly from GPS points, there is no set CRS
## Coordinate Reference System: NA
# Let's try plot this
base_plot1 +
  geom_sf(data = pr_no_crs)
## Error in st_transform.sfc(X[[i]], ...): cannot transform sfc object with missing crs

When we try to plot this we get an error… This is because we have simply used the longitude and latitude coordinates without provided a coordinate reference system (CRS). Let’s provide a CRS and try again:

st_crs(sern_a)$input # first see what CRS the map layers have
## [1] "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
pr_WGS_84 <- pr_no_crs # create a new object for the points
st_crs(pr_WGS_84) = 4326 # assigning it the CRS 4326, which is short for the World Geodetic System 1984 (WGS 84), a spatial reference system. 
st_crs(pr_WGS_84)$input # check that this worked. These give different input codes. One using the proj4string and the other the EPSG code. These are essentially the same thing. 
## [1] "EPSG:4326"
# Let's try plot this now
base_plot1 +
  geom_sf(data = pr_WGS_84) 

Interactive maps to identify odd features

Some of these points look a bit strange. It is always good to inspect these to see if they may be an issue later. We can make a simple interactive map to explore this. We can use the leaflet package to make interactive maps. Let’s include information on the date & type of the observation.

mapview(pr_WGS_84)

Making a ‘good’ map

You should notice when you zoom in that many points are too regularly spaced to be exact locations of species sightings; rather, such points are likely to be centroids of (relatively large) grid cells on which particular surveys were based. This is not necessarily a problem, but remember to adjust the spatial resolution of your analysis accordingly! In our case, we want to now remove these observations from our dataset. An easy way to do this is to filter by date. We will only use dates from after 1980.

pr_WGS_84 %>% filter(year > 1980) -> pr_1980

To make an appealing map, it is useful to separate the regions of interest from those that form part of the background. In our case, South Africa, Lesotho and eSwatini (which is still labeled as Swaziland in many online databases) are our regions of interest.

# let's separate the countries with points from those without
sa_les_esw <- sern_a %>% filter(admin %in% c('South Africa', 'Lesotho', 'Swaziland'))
other_sern_a <- sern_a %>% filter(!admin %in% c('South Africa', 'Lesotho', 'Swaziland')) # use a 'negative' filter to include all other countries

Let’s now prepare our ‘finished’ plot. We can now plot our regions of interest in one colour and our background regions as a different colour. We can change the transparancy and size of our species locality and we focus in the limits of our map for the x and y axes:

ggplot() +
  geom_sf(data = other_sern_a, fill = 'gray90', lwd = 0.3) + # change the fill colour
  geom_sf(data = sa_les_esw, fill = 'white', lwd = 0.3) + # change the fill colour
  geom_sf(data = pr_1980, alpha = 0.6, size = 2) + # change the transparency and size of the points  
  scale_x_continuous(limits = c(16, 33), breaks = seq(16, 32, 4)) + # add in breaks for the labels using sequence function
  scale_y_continuous(limits = c(-35, -22), breaks = seq(-34, -22, 4)) + # add in breaks for the labels using sequence function
  xlab('Longitude') + ylab('Latitude') 

Next, we can add in the a scale bar and north arrow using the ggspatial package. We add in text using the annotate function and lastly, we change some elements of the overall theme and background elements, such as the ocean colour and graticule (or grid) lines to be more visually pleasing. Save as this as another base plot for later use.

base_plot2 <- ggplot() +
  geom_sf(data = other_sern_a, fill = 'gray90', lwd = 0.3) + # change the fill colour
  geom_sf(data = sa_les_esw, fill = 'white', lwd = 0.3) + # change the fill colour
  geom_sf(data = pr_1980, alpha = 0.6, size = 2) + # change the transparency and size of the points  
  scale_x_continuous(limits = c(16, 33), breaks = seq(16, 32, 4)) + # add in breaks for the labels using sequence function
  scale_y_continuous(limits = c(-35, -22), breaks = seq(-34, -22, 4)) + # add in breaks for the labels using sequence function
  xlab('Longitude') + ylab('Latitude') + # change the x and y axes labels
  
  
  annotation_scale(location = 'br', width_hint = 0.1, style = 'ticks', tick_height = 0) + # add in a scale bar
  annotation_north_arrow(location = 'br', pad_y = unit(1, 'cm'), style = north_arrow_minimal()) + # add in a north arrow, and change around the placement and arrow style
  annotate(geom = 'text', x = 23.5, y = -30.5, label = 'SOUTH AFRICA', colour = 'black', size = 4) + # add in text for SA
  annotate(geom = 'text', x = 28.25, y = -29.5, label = 'LESOTHO', colour = 'black', size = 2.6) + # add in text for Lesotho
  annotate(geom = 'text', x = 31.45, y = -26.6, label = 'ESWATINI', colour = 'black', size = 2.6) + # add in text for eSwatini
  theme_bw() + # edit the overall theme (there are several other options)
  theme(panel.grid = element_line(color = gray(.5), linetype = 'dashed', size = 0.3),
        panel.background = element_rect(fill = '#E8F6FC')) # lastly, change the grid line parameters and change the background (ocean) colour

base_plot2
## Scale on map varies by more than 10%, scale bar may be inaccurate

Saving the map

Within ggplot2, we can use the ggsave function to easily save figures to different formats. PDF and PNG formats are fundamentally different, in that PDFs are vector based while PNGs (or JPEGs) are pixel based. This means you do not need to specify a resolution or DPI (Dots Per Inch) for PDFs, but for pixel based outputs, you should. Save the outputs to your output folder, specifying the file type and the width and height of your plot.

# first as a pdf, specify the output directory, the object and the width & height
ggsave('output/figs/making_a_map/map.pdf', base_plot2, width = 6.68, height = 5.76)
# next as a png, specify the same parameters, but now as it is a pixel-based file type, specify the resolution
ggsave('output/figs/making_a_map/map.png', base_plot2, width = 6.68, height = 5.76, dpi = 300)

Inset map

This is a reasonable place to end, however, if your map is of a small area (luckily not our case), it can often be useful to add an inset map to show where your site/study species is relative to other known features. For this, we can produce an inset map. In our case we will produce an inset map showing where southern Africa is relative to the rest of Africa, with a box showing our plot extent.

# Let's create a box that covers our map to show where our site is.
# We will use the limits of the previous map 

# A rectangle requires 5 points to make a complete polygon, so we will create a list with 2 columns (lon & lat) and 5 rows (5 points)
pol_lon = c(16, 33)
pol_lat = c(-35, -22)
coord_df = data.frame(pol_lon, pol_lat)

# Now make the list
coord_list <- list(
  cbind(
    coord_df$pol_lon[c(1,2,2,1,1)], 
    coord_df$pol_lat[c(1,1,2,2,1)])
)

# Convert the points to an sf polygon
base_plot2_bbox <- st_polygon(coord_list, dim = 'XY')
# Add a CRS that matches our previous systems
map_bbox <- st_sfc(base_plot2_bbox, crs = 4326)

# Plot to see if this is working
ggplot() +
  geom_sf(data = world) +
  geom_sf(data = map_bbox, fill = NA, col = 'red') # use fill = NA to make the middle of the polygon transparent

We have identified where our site is relative to other features. Now we can produce a ‘final’ inset map:

#### Let's now zoom in our inset map and try to match some of the other features of our basemap
# Download just Africa
africa <- ne_countries(continent = 'africa', returnclass = 'sf')
other_africa <- africa %>% filter(!admin %in% c('South Africa', 'Lesotho', 'Swaziland'))

# Create the inset map
inset_map <- ggplot() +
  geom_sf(data = other_africa, fill = 'gray90', lwd = 0.2) +
  geom_sf(data = sa_les_esw, fill = 'white', lwd = 0.2) +
  geom_sf(data = map_bbox, fill = NA, col = 'black', lwd = 0.8) +
  scale_x_continuous(limits = c(-20, 50)) +
  scale_y_continuous(limits = c(40, -35)) +
  theme_void() + # this theme removes ALL axes, grids and labels 
  theme(panel.border = element_rect(colour = 'black', fill = NA),
        panel.background = element_rect(fill = '#E8F6FC')) # we then want to add a border and the background colour back in

inset_map

To combine our base map and inset map together, we can use the patchwork package. The function inset_element() requires that we specify where on the base map our inset map will fall.

combined_maps <- base_plot2 + inset_element(inset_map, left = 0.025, bottom = 0.6, right = 0.4, top = 0.975) 
# we need to chose the positioning of the inset, assuming the plot values go from 0 to 1 for each axis.
combined_maps
## Scale on map varies by more than 10%, scale bar may be inaccurate

Then we can save our final combined map:

# Save the output
ggsave('output/figs/making_a_map/combined_maps.png', combined_maps, width = 6.68, height = 5.76, dpi = 300)  

Lastly, we want to save our filtered and cleaned species locality data as a shapefile to use in our SDMs later:

# write the P. roupelliae locations to a shapefile
write_sf(pr_1980, 'output/files/making_a_map/p_roup_gbif.shp')

Extra resources for map making

The Geocomputation with R book has lots of excellent tutorials on the theory and application of spatial data in R. For a start here is a full tutorial showing many different approaches to making maps in R.