quarto-input3ece676bce49b304

Peggy Newman, Martin Westgate, Amanda Buyan, Dax Kellie & Shandiya Balasubramaniam

The problem

For researchers, getting data out of GBIF nodes is easy…

…but sharing your own data is hard.

Hurdles

Darwin Core Standard formatting isn’t easy (e.g., .xml)
Existing documentation isn’t well-suited to newbies
Poor integration with existing workflows (i.e. in R or Python)
Sharing data is low on priority list

`galaxias` (and friends)

	`galaxias`: Build, check & publish DWCAs
	`corella`: Convert a `tibble` to Darwin Core
	`delma`: Convert markdown to `EML` or `xml`

Darwin Core

An archive is a .zip file containing three things:


data csv format	metadata eml format	schema xml format

Process


data	metadata	schema	archive	validate	submit

Data

Load galaxias

library(galaxias)

delma and corella are loaded automatically

Data

Load an example dataset

library(readr)

df <- read_csv("my_example_data.csv")
df

# A tibble: 2 × 5
  latitude longitude date       time  species                 
     <dbl>     <dbl> <chr>      <chr> <chr>                   
1    -35.3      149. 14-01-2023 10:23 Callocephalon fimbriatum
2    -35.3      149. 15-01-2023 11:25 Eolophus roseicapilla

Data

How should we convert this dataset to Darwin Core?

suggest_workflow(df)

Data

If we follow that advice:

df_dwc <- df |>
  set_occurrences(occurrenceID = sequential_id(),
                  basisOfRecord = "humanObservation") |> 
  set_coordinates(decimalLatitude = latitude, 
                  decimalLongitude = longitude) |>
  set_datetime(eventDate = lubridate::dmy(date),
               eventTime = lubridate::hm(time)) |>
  set_scientific_name(scientificName = species, 
                      taxonRank = "species")

df_dwc

# A tibble: 2 × 8
  basisOfRecord    occurrenceID decimalLatitude decimalLongitude eventDate 
  <chr>            <chr>                  <dbl>            <dbl> <date>    
1 humanObservation 01                     -35.3             149. 2023-01-14
2 humanObservation 02                     -35.3             149. 2023-01-15
# ℹ 3 more variables: eventTime <Period>, scientificName <chr>, taxonRank <chr>

Data

Save as occurrences.csv:

use_data(df_dwc)

Process


data	metadata	schema	archive	validate	submit

Metadata

Generate a metadata file

use_metadata_template() # creates the following file:

# Dataset
 
 ## Title
 
 A Sentence Giving Your Dataset Title In Title Case
 
 ## Abstract
 
 A paragraph outlining the content of the dataset
 
 ## Creator
 
 ### Individual name
 
 #### Surname

Metadata

Convert to EML

use_metadata("metadata.Rmd") # creates the following file:

<?xml version="1.0" encoding="UTF-8"?>
 <emlEml xmlns:d="eml://ecoinformatics.org/dataset-2.1.0" xmlns:eml="eml://ecoinformatics.org/eml-2.1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://purl.org/dc/terms/" xsi:schemaLocation="eml://ecoinformatics.org/eml-2.1.1 http://rs.gbif.org/schema/eml-gbif-profile/1.3/eml-gbif-profile.xsd" system="R-paperbark-package" scope="system" xml:lang="en">
   <dataset>
     <title>A Sentence Giving Your Dataset Title In Title Case</title>
     <abstract>A paragraph outlining the content of the dataset</abstract>
     <creator>
       <individualName>
         <surname>Person</surname>
         <givenName>Steve</givenName>
         <electronicMailAddress>example@email.com</electronicMailAddress>
       </individualName>
       <organisationName>Put your organisation name here</organisationName>
       <address>
         <deliveryPoint>215 Road Street</deliveryPoint>
         <city>Canberra</city>

Process


data	metadata	schema	archive	validate	submit

Process


data	metadata	schema	archive	validate	submit

Validate

# validate locally
check_directory() 

# validate via GBIF API
check_archive(username = "a_gbif_user",
              email = "my@email.com",
              password = "a_secure_password")

Process


data	metadata	schema	archive	validate	submit

Submitting

Run submit_archive() to create an issue on data-publication repository

Process


data	metadata	schema	archive	validate	submit

Benefits of `galaxias`

Darwin Core Standard formatting is easy (e.g., .xml)
Documentation well-suited to newbies
Good integration with existing workflows (i.e. in R or Python)
Sharing data is on the priority list (?)

Thank you

Peggy Newman

Martin Westgate

Amanda Buyan

Dax Kellie

Shandiya Balasubramaniam

	`galaxias`
	`corella`
	`delma`
	`galah`

The problem

Hurdles

`galaxias` (and friends)

Darwin Core

Process

Data

Data

Data

Data

Data

Process

Metadata

Metadata

Process

Archive

Archive

Process

Validate

Process

Submitting

Process

Benefits of `galaxias`

Thank you

The problem

Hurdles

Q: How can we help researchers share biodiversity data?

galaxias (and friends)

Darwin Core

Process

Data

Data

Data

Data

Data

Process

Metadata

Metadata

Process

Archive

Archive

Process

Validate

Process

Submitting

Process

Benefits of galaxias

Thank you

`galaxias` (and friends)

Benefits of `galaxias`