GBIF logtype

Data standards

The data accessible through GBIF, SBDI, and associated services is the outcome of collaborative efforts among GBIF network participants and publishers who adhere to shared rules and conventions. These guidelines are used to describe, record, and structure a wide range of datasets obtained from numerous institutions worldwide. The application of common standards plays a vital role in consolidating the vast collection of primary biodiversity records within the GBIF index.

In the realm of biodiversity, the primary entity responsible for developing and maintaining data standards is Biodiversity Information Standards (commonly known as TDWG, pronounced tad-wig). TDWG operates as a nonprofit scientific and educational association affiliated with the International Union of Biological Sciences. Their main focus lies in establishing standards for the exchange of biological and biodiversity data. While formerly known as the Taxonomic Databases Working Group, the abbreviation TDWG persists as a name that the biodiversity community commonly uses to refer to this group.

Commonly used standards

Darwin Core

The Darwin Core Standard (DwC) offers a stable, straightforward and flexible framework for compiling biodiversity data from varied and variable sources. The majority of the datasets shared through GBIF and SBDI are published using the Darwin Core Archive format (DwC-A).

EML: Ecological Metadata Language

Ecological Metadata Language, or EML, is a metadata standard that records information about ecological datasets in a series of modular and extensible XML document types. All of the descriptions of datasets in GBIF rely on ‘metadata’—that is, the information about data—using the open-source EML standard, which is administered and maintained by The Knowledge Network for Biocomplexity. Each Darwin Core Archive includes as one of its components an EML file (written in XML format).