Data & file formats
Data format
Standardization facilitates data discovery, integration, sharing and interoperability. EurOBIS (and OBIS) use the OBIS-ENV data format, based on the Darwin Core Archive (DwC-A) standard for biodiversity. The Darwin Core standard includes a glossary of terms intended to facilitate the sharing of information about biological diversity by providing identifiers, labels, and definitions. In addition it determines the way your data will be structured (i.e. number of tables).
We can divide the data format in three main blocks, each of them with their own specific requirements and characteristics:
Data structure
Marine biological data often include measurements related to habitat features, such as physical and chemical variables of the environment, and biotic measurements (such as body size, abundance, biomass, etc) as well as details regarding the nature of the sampling or observation methods, equipment, and sampling effort.
In order to capture all this information, the conceptual data model of the Darwin Core Archive is a “star schema” with a core table in the center of the star (the sampling event) and extension tables radiating out of the center. In practice, EurOBIS (and OBIS) use a subset of 1 to 3 tables to represent the data. In most cases, we use all three tables.
The three tables (event, occurrence and extended measurement or facts) are related via the eventID and the occurrenceID fields.
(Detailed information on the OBIS-ENV scheme can be found in De Pooter et al. 2017)
Field nomenclature
EurOBIS follows the OBIS-ENV schema field names which contain some additions to the generally accepted ones in the Biodiversity Information Standards (TDWG) website (e.g. used in IPT).
The Darwin Core standard contains more terms than the ones used in the OBIS-ENV schema. This spreadsheet template contains a detailed summary of the most common and mandatory fields for EurOBIS.
Content
Data interoperability is achieved through the use of controlled vocabularies. In addition to the field names, the content of the data itself have to follow certain standards. An explanation on how the mandatory fields are populated is available in the OBIS manual.
File format
Preferably, datasets should be submitted as a Darwin Core Archive through the Integrated Publishing Toolkit (IPT) developed by GBIF. If you would like to submit data through IPT, please email us at info@eurobis.org.
However, data can also be submitted to EurOBIS in a data format different from OBIS-ENV, as well as in different file formats:
- Excel spreadsheet (.xls, .xlsx)
- Access database (.mdb, .accdb)
- Comma/tab separated values (.csv)
- Text file (.txt)
- Others (email info@eurobis.org for more information)
Upon submission, the EurOBIS data management team will contact the data providers to inform on the progress of the dataset and to communicate if there would be any doubt or remarks about the dataset.
The EurOBIS data infrastructure is used as the central hub for making biological data available within the biological lot of the European Marine Observation and Data Network (EMODnet Biology). In this context a Guidance to publishing data in EurOBIS and EMODnet Biology was developed in order to facilitate and encourage data sharing and quality control through the network. More information about the relation between EurOBIS and EMODnet can be found here.
Additionally, the EurOBIS data infrastructure is also part of the LifeWatch Species Information Backbone. This Backbone facilitates the standardization of species data and the (virtual) integration of the many distributed biodiversity data repositories and operating facilities. In turn, the Backbone is the driving force behind the species information services of the Belgian LifeWatch e-Lab, through which EurOBIS data can also be consulted, queried and explored.