Project

General

Profile

Actions

OpenAtlas INDIGO Vocabs Meeting 2022-04-22

Updated information in the course of the meeting is in color. Every participant is invited to add and adapt.

Participants

  • Alexander Watzinger
  • Geert Verhoeven
  • Jona Schlegel
  • Massimiliano Carloni
  • Stefan Wogrin

Some issues regarding the current design of the thesaurus were discussed, based on a previous meeting. More specifically, three topics were discussed and kindly documented by Massimiliano.

Structure of the thesaurus

It is usual practice to structure SKOS vocabularies according to a hierarchy built with skos:broader and skos:narrower relationships, which are established between different concepts (i.e., instances of class skos:Concept).

The INDIGO thesaurus, instead, uses collections (i.e., instances of class skos:Collection) to represent its structure. Collections are also called “groups” in the interface of the Vocabs service. A collection is like a container that brings together different concepts that are related to each other for certain characteristics; a concept (or even a collection) can only be a member (property skos:member) of a certain collection (NOT skos:narrower! This property can only be used between concepts).

Therefore, the INDIGO thesaurus currently has a very flat hierarchy, where every concept is at the top level of the thesaurus (with the only exception of “First Line” and “Second Outline”, which are narrower than “Outline”).
The structure of the thesaurus is represented instead by means of collections. There are even collections that are members of other collections.

While this is formally and technically possible according to the SKOS specification, this kind of structure might render the thesaurus less understandable and usable for other services and users. For example, it is already creating issues with the import into OpenAtlas. Furthermore, it is difficult to manage such a vocabulary with the existing input tools, as also highlighted by Klaus on Mattermost. Therefore, we should consider if we want to restructure the vocabulary in order to create a SKOS hierarchy based on narrower-broader relationships. This would require rethinking the whole structure not only from a technical, but also (and foremost) from a semantic point of view.

Two important aspects to consider:
  • Alex stresses that the implementation in OpenAtlas is not a major factor to consider, since he can adjust it in case we need to maintain collections.
  • Alex correctly points out that the meaning of broader concepts and collections might be different. While in case 1, for example, the concept “Green” could be used as an option to describe an object, in case 2 the collection “My favorite colors” is not really a select able option, but only a ‘container’.
Case 1: Hierarchy with broader-narrower concepts
  • Green
    • Light green
    • Dark green
Case 2: Structure with collections
  • My favorite colors
    • Light green
    • Dark green

Identifiers

All participants agree that it is best to use names of concepts and collections in identifiers instead of numbers.
For example, for concept “Commissioned Work”, we would change the identifier as follows:
(old) https://vocabs.acdh.oeaw.ac.at/indigo/0018
(new) https://vocabs.acdh.oeaw.ac.at/indigo/commissionedWork
provided that we are using the camelCase writing practice.

Versioning

Jona asks if it is possible to compare different versions of the thesaurus, like in the diff view on GitHub. Massimiliano answers that, unfortunately, this is not possible in either the Vocabs service or ARCHE. The only way to track changes is to update the version number according to semantic versioning rules and keep a log in the metadata to the thesaurus (for example, by using the Dublin Core “description” property).

However, we could still think of storing the different thesaurus releases on GitHub, and automatically archive them into ARCHE (a similar workflow will be soon implemented for the ARCHE ontology itself). This way, it would be possible to compare different releases directly in GitHub, for example by following this method: https://docs.github.com/en/repositories/releasing-projects-on-github/comparing-releases

Questions and answers after the meeting in the chat

Asked by Jona and answered by Klaus.

Versioning

Question
Would at be at least possible to track the changes like going back in the history/ seeing the implemented changes? I am thinking about something similar to the diff view in GitHub.

Answer
If we talk about versioning on side of Skosmos (the presentation layer on vocabs.acdh-dev.oeaw.ac.at) then the answer is simple that there is no versioning. we introduced a custom versioning workflow at ACDH-CH by adding the dumps of the vocabularies into the ARCHE repository. Visualizing differences is not implemented there, custom scripts may introduce this. generally we usually agree on a stable version of a vocabulary which is published on the public server. this version should be not in development (means: we avoid single changes on the public server). new versions on the public server should raise a version number and lead to upload of a new dump on ARCHE. there is no going back in history on side of Skosmos, but there is the possibility to see changes between versions, if the timestamps are correctly set (these timestamps need to be set individually per changed concept in the input tool)

Structure

Question
So, from a thesaurus/vocabs stand point, what would then be the best-practice in terms of collections/ group? Is it not considered helpful in structuring the thesaurus? Are there maybe good examples of the usage of collections and thesauri linked to databases and their structure?
How we proceed: Why is the "Concept Idea" not ok? Is it because it it to much similar to the term 'concept' which is used in SKOS? Or are you just mentioning it, because we could not use it in OpenAtlas as it would be a collection/ group there as well? Because, we actually do not want to use it as a type/ attribute to describe a graffito. We are aware, that we only can (and also want) to use concepts in OpenAtlas.

Answer
Formally, the use of collections/groups is correct and there are some examples where it is used parallel to a hierarchy. in terms of practicability it depends strongly on the input tool that you use. the reason, why most vocabularies only use the hierarchical view is that input fields of input tools are usually only capable to handle the hierarchical view. collections are more complicated, as they have terms that are not defined concepts and may additionally introduce complexity which select/combo boxes can't handle very good. for my projects i can only state, that i tell them that the input form is created based on the hierarchy and that the collections are ignored by the input tool. collections are just nice for the vocabulary presentation layer on Skosmos and may give some helpful background information for people that browse such a vocabulary.

Updated by Alexander Watzinger almost 2 years ago · 3 revisions

Also available in: PDF HTML TXT