galah

Data from living atlases in R

Martin Westgate / Atlas of Living Australia
Statistical Society of Australia / 2022-09-08

R & ALA
A brief history

ALA4R / benefits


  • Groundbreaking: released in 2014
  • Flexible: return the data you want, customised in various ways
  • Inclusive: most options accessible via the API can be constructed

ALA4R / problems


No function naming convention

  • abbreviations: aus()
  • snake case: ala_fields()
  • contractions: fieldguide()
  • single words: occurrences(), images()

ALA4R / problems


Confusing syntax

  • unclear differences between functions
    • ala_list(), ala_lists(), specieslist()
  • argument names require specialist knowledge
    • wkt, fq, qa
  • arguments require solr queries passed as strings:
    • "taxon_name:\"Alaba vibex\""

ALA4R / problems


Inconsistent behaviour

  • most functions return a data.frame
  • occurrences() returns a list
  • fieldguide() and plot.occurrences() output a PDF

{galah}
Tidy principles for living atlases

galah / benefits

  • Query the ALA (and other national GBIF nodes)
  • Use tidy, pipe-able syntax

galah / benefits



Lookup Narrow a query Run a query
show_all() galah_filter() atlas_counts()
search_all() galah_select() atlas_occurrences()
galah_group_by() atlas_media()

Data / number of records

library(galah)

galah_call() |>
  galah_identify("Eolophus roseicapilla") |> # galahs
  atlas_counts()
# A tibble: 1 × 1
   count
   <int>
1 993702

Data / number of records

galah_call() |>
  galah_identify("Eolophus roseicapilla") |>
  galah_filter(year >= 2010,
               dataResourceName == "iNaturalist Australia") |>
  atlas_counts()
# A tibble: 1 × 1
  count
  <int>
1  7709

Data / number of records

galah_call() |>
  galah_identify("Eolophus roseicapilla") |>
  galah_filter(year >= 2010,
               dataResourceName == "iNaturalist Australia") |>
  galah_group_by(year) |>
  atlas_counts()
# A tibble: 13 × 2
   year  count
   <chr> <int>
 1 2021   1939
 2 2020   1579
 3 2022   1287
 4 2019    947
 5 2018    836
 6 2017    539
 7 2016    197
 8 2015    110
 9 2014     81
10 2013     62
11 2011     54
12 2012     42
13 2010     36

Data / number of records

galah_call() |>
  galah_identify("Cacatuidae") |> # cockatoos
  galah_filter(year >= 2019) |>
  galah_group_by(year, dataResourceName) |>
  atlas_counts()
# A tibble: 15 × 3
   dataResourceName             year   count
   <chr>                        <chr>  <int>
 1 eBird Australia              2021  248142
 2 eBird Australia              2020  213750
 3 eBird Australia              2019  173059
 4 iNaturalist Australia        2021    7661
 5 iNaturalist Australia        2020    6110
 6 iNaturalist Australia        2022    5347
 7 iNaturalist Australia        2019    3555
 8 NSW BioNet Atlas             2020    7309
 9 NSW BioNet Atlas             2019    6620
10 NSW BioNet Atlas             2021    2394
11 NSW BioNet Atlas             2022     814
12 Victorian Biodiversity Atlas 2019    3379
13 Victorian Biodiversity Atlas 2020    1174
14 Victorian Biodiversity Atlas 2021     118
15 Victorian Biodiversity Atlas 2022       9

Data / occurrences

library(galah)
library(ozmaps)
library(sf)
library(ggplot2)

# Enter email
galah_config(email = "martinjwestgate@gmail.com")

# Download species occurrences
obs <- galah_call() |>
  galah_identify("peramelidae") |>
  galah_filter(year == 2021) |>
  atlas_occurrences()

# Ensure map uses correct projection
oz_wgs84 <- ozmap_data(
  data = "country") |>
  st_transform(crs = st_crs("WGS84"))

# Map points
ggplot(data = obs) + 
  geom_sf(data = oz_wgs84, 
          fill = "white") +
  geom_point(aes(
      x = decimalLongitude,
      y = decimalLatitude), 
    color = "#78cccc") +
  theme_void()

Data / other atlases

library(gt)
show_all_atlases() |> gt()
atlas institution acronym url
Australia Atlas of Living Australia ALA https://www.ala.org.au
Austria Biodiversitäts-Atlas Österreich BAO https://biodiversityatlas.at
Brazil Sistemas de Informações sobre a Biodiversidade Brasileira SiBBr https://sibbr.gov.br
Canada Candensys NA http://www.canadensys.net
Estonia eElurikkus NA https://elurikkus.ee
France Inventaire National du Patrimoine Naturel INPN https://inpn.mnhn.fr
Guatemala Sistema Nacional de Información sobre Diversidad Biológica de Guatemala SNIBgt https://snib.conap.gob.gt
Portugal GBIF Portugal GBIF.pt https://www.gbif.pt
Spain GBIF Spain GBIF.es https://www.gbif.es
Sweden Swedish Biodiversity Data Infrastructure SBDI https://biodiversitydata.se
United Kingdom National Biodiversity Network NBN https://nbn.org.uk

Data / other atlases

library(purrr)
library(tibble)
library(dplyr)

atlases <- show_all_atlases()

counts <- map(atlases$atlas, 
  function(x){
    galah_config(atlas = x)
    atlas_counts()
})

tibble(
  atlas = atlases$atlas, 
  n = unlist(counts)) |> 
  arrange(desc(n)) |>
  gt() |> 
  fmt_number(columns = n)
atlas n
United Kingdom 204,987,409.00
Australia 113,660,291.00
Sweden 103,416,609.00
France 87,443,384.00
Spain 33,246,325.00
Brazil 23,839,513.00
Portugal 16,043,865.00
Austria 7,786,013.00
Estonia 6,917,993.00
Canada 6,342,543.00
Guatemala 3,033,076.00

Challenges / galah


  • Clean & user-friendly naming conventions (lifecycle)
  • Coding non-standard evaluation (tidyselect)
  • Translating R to solr
  • Caching to improve performance (twice)
  • APIs: rate-limiting, multiple end-points, capturing errors
  • Support (documentation, ALA labs)

Thank you


Martin Westgate
Team Leader / Science & Decision Support / ALA
e: martin.westgate@csiro.au
t: @westgatecology
gh: @mjwestgate

galah development team
Matilda Stevenson
Dax Kellie
Shandiya Balasubramaniam
Peggy Newman


These slides were made using Quarto & RStudio