Wikipedia:Meetup/Cost MOBILISE Wikidata Workshop

Video of one of the talks given at the workshop, with the thumbnail highlighting the Scholia page for the event.

This is an invitation only event

Purpose

A workshop to develop the data model for taxonomic and nomenclatural data in Wikidata.

Goal

To improve the interoperability of Wikidata data on biological taxonomy with other taxonomic data sources.

How

By bringing together Wikidata specialists, database managers and biodiversity informatics experts to facilitate exchange of knowledge.

Item for the event

Wikidata: Q84943795 (Scholia)

Background

In just seven years, Wikidata has become one of the most active and influential collections of data on the internet. It currently holds more than 51 million data items and has more than 3 million registered users, of which about 20,000 are active contributors (https://www.wikidata.org/wiki/Special:Statistics). It can be edited by anyone, and many organizations and individuals have contributed data to it, including many people who would be considered citizen scientists.

An important aspect of Wikidata is its data model. Wikidata’s data model is built around items and properties. An item might be, for instance, any legal entity, such as an institution (e.g. Meise Botanic Garden, Q52224718) or a concept (e.g. determinism Q131133). Each item is described with statements in the form of triples consisting of a subject, predicate and object. For example, the gender of a person (the subject) would be described with the property (the predicate) ‘sex or gender’ (P21) with an object of the class gender, such as female (Q6581072). However, objects do not necessarily have to be Wikidata items, they can also be text strings or values, such as numbers or dates. Unique to the Wikidata knowledge graph is that the provenance of these statements is captured in references and qualifiers, directly linked to the individual statements

Wikidata can be queried and edited both by humans and machines, and it contains data from many different domains of knowledge. All data on Wikidata are available in the public domain under a Creative Commons Zero licence waiver which eliminates any ownership barriers to the use of data. Data can also be queried across databases as long as those other databases also provide an open SPARQL Endpoint. This is particularly useful for interlinking big data, which would otherwise overwhelm Wikidata. Data such as genetic sequence data (200 million sequence in GenBank) and biodiversity observations (> 1 billion in GBIF) would be too large to incorporate in Wikidata, but could be linked together with identifiers..

Wikidata has many potential scientific uses in the life sciences ^[1]. The Gene Wiki project has proven its value in annotating and curating genomic data ^[2] ^[3]. The Scholia project is examining Wikidata’s use in scientometrics through creating citation networks ^[4] ^[5]. It is being used to examine scientific hypotheses on the causes of invasive species (https://www.wikidata.org/wiki/Wikidata:WikiProject_Wikidata_for_research/Hierarchy_of_Hypotheses_Workshop_2018). It can even be used in the visualization and examination of science history.

Wikidata already contains a large amount of data on the life sciences, including taxonomy, nomenclature and genetics. Its flexibility and openness means that it is versatile and interoperable with a large number of other data sources. However, its data model for biological taxonomy and nomenclature needs to be improved to enable it to reach its full potential with biodiversity data.

Issue

The description of knowledge in Wikidata needs to be appropriate to make it useful to the myriad of users of taxonomic names and knowledge. Currently the data model for taxonomic, nomenclatural and specimen data on Wikidata is not suitable to hold critical items, such as typification information, and is inconsistent with the International Code of Nomenclature of algae, fungi, and plants and the International Code of Zoological Nomenclature. Nor is it compatible with other community standards, such as Darwin Core. However, Wikidata is flexible enough that it can be adapted to such needs. We need to build a community consensus of how this model should be changed and provide the momentum to ensure these changes will be made. We propose a dedicated Wikidata workshop on taxonomic data to bring together the relevant stakeholders. These might include, but are not restricted to, representatives of Wikidata (https://www.wikidata.org/wiki/Wikidata:WikiProject_Taxonomy), taxonomic databases in Europe (COL+, IPNI, PESI, Index Fungorum, EASIN etc), Citizen Science organizations, the Global Biodiversity Information Facility, biodiversity informaticians, Biodiversity Information Standards (TDWG) and Open Data organisations (RDA, Plazi).

We will examine how to improve the data model for biological nomenclature, linkage to data on typification, such as type specimens, codifying different taxonomic concepts and links to other relevant data, such as protologues, genetics and biogeography.

When and where

Thursday 13th - Friday 14th February 2020
Warsaw, Poland

Agenda

Overarching themes To be considered by all groups in their discussions

Bias in Wikidata: How much of a problem is this? What can be done to address it?
- Gender
- Geographic
- Taxonomic
Referencing or the lack of it
One Wikidata or multiple Wikibases

References

^ Mitraka, E., Waagmeester, A., Burgstaller-Muehlbacher, S., Schriml, L., Su, A. I., & Good, B. M. (2015). Wikidata: A platform for data integration and dissemination for the life sciences and beyond. https://doi.org/10.1101/031971
^ Burgstaller-Muehlbacher, S., Waagmeester, A., Mitraka, E., Turner, J., Putman, T. E., Leong, J., … Su, A. I. (2016). Wikidata as a semantic framework for the Gene Wiki initiative. Database, 2016, baw015. https://doi.org/10.1093/DATABASE/BAW015
^ Putman, T. E., Lelong, S., Burgstaller-Muehlbacher, S., Waagmeester, A., Diesh, C., Dunn, N., … Good, B. M. (2017). WikiGenomes: an open web application for community consumption and curation of gene annotation data in Wikidata. Database, 2017(1). https://doi.org/10.1093/DATABASE/BAX025
^ Nielsen, F. Å., Mietchen, D., & Willighagen, E. (2017). Scholia, Scientometrics and Wikidata. The Semantic Web: ESWC 2017 Satellite Events. https://doi.org/10.1007/978-3-319-70407-4_36
^ Rasberry, L., Willighagen, E., Nielsen, F., Mietchen, D. (2019) Robustifying Scholia: paving the way for knowledge discovery and research assessment through Wikidata. Research Ideas and Outcomes 5: e35820. https://doi.org/10.3897/rio.5.e35820

Notes

also see https://tools.wmflabs.org/scholia/topic/Q2013

Mitraka, E., Waagmeester, A., Burgstaller-Muehlbacher, S., Schriml, L., Su, A. I., & Good, B. M. (2015). Wikidata: A platform for data integration and dissemination for the life sciences and beyond. https://doi.org/10.1101/031971
Turland, N. J., Wiersema, J. H., Barrie, F. R., Greuter, W., Hawksworth, D. L., Herendeen, P. S., Knapp, S., Kusber, W.-H., Li, D.-Z., Marhold, K., May, T. W., McNeill, J., Monro, A. M., Prado, J., Price, M. J. & Smith, G. F. (eds.) 2018: International Code of Nomenclature for algae, fungi, and plants (Shenzhen Code) adopted by the Nineteenth International Botanical Congress Shenzhen, China, July 2017. Regnum Vegetabile 159. Glashütten: Koeltz Botanical Books. DOI https://doi.org/10.12705/Code.2018
Wikidata/Strategy/2019 https://meta.wikimedia.org/wiki/Wikidata/Strategy/2019

[1] Mitraka, E., Waagmeester, A., Burgstaller-Muehlbacher, S., Schriml, L., Su, A. I., & Good, B. M. (2015). Wikidata: A platform for data integration and dissemination for the life sciences and beyond. https://doi.org/10.1101/031971

[2] Burgstaller-Muehlbacher, S., Waagmeester, A., Mitraka, E., Turner, J., Putman, T. E., Leong, J., … Su, A. I. (2016). Wikidata as a semantic framework for the Gene Wiki initiative. Database, 2016, baw015. https://doi.org/10.1093/DATABASE/BAW015

[3] Putman, T. E., Lelong, S., Burgstaller-Muehlbacher, S., Waagmeester, A., Diesh, C., Dunn, N., … Good, B. M. (2017). WikiGenomes: an open web application for community consumption and curation of gene annotation data in Wikidata. Database, 2017(1). https://doi.org/10.1093/DATABASE/BAX025

[4] Nielsen, F. Å., Mietchen, D., & Willighagen, E. (2017). Scholia, Scientometrics and Wikidata. The Semantic Web: ESWC 2017 Satellite Events. https://doi.org/10.1007/978-3-319-70407-4_36

[5] Rasberry, L., Willighagen, E., Nielsen, F., Mietchen, D. (2019) Robustifying Scholia: paving the way for knowledge discovery and research assessment through Wikidata. Research Ideas and Outcomes 5: e35820. https://doi.org/10.3897/rio.5.e35820

[1]

[2]

[3]

[4]

[5]