OpenAtlas to ARCHE meeting 2025-06-03, 11:30¶
Location: ACDH-CH, Bäckerstraße 13, meeting room 3D
Updated information in the course of the meeting is in color and/or marked with an ✅. Every participant is welcome to add and adapt.
Topics are about archiving concluded projects that used OpenAtlas for data acquisition and structuring, in the long term archiving system ARCHE
Participants¶
ARCHE- Martina Trognitz
- Mateusz Żółtak
- Seta Štuhec
- Alexander Watzinger
- Bernhard Koschiček-Krombholz
- Nina Richards
Projects planned for archiving in 2025¶
- bITEM (most likely the first one)
- CONNEC
- Shahi (actual archiving in 2026 but we will prepare what is possible before)
Maybe we can discuss the administrative status of these, e.g. disposition deposition agreements (DE: Datenübergabevertrag), as well.
General aim & questions¶
We need to establish a common ground on what we exactly mean and aim for when speaking about 'archiving'.
We have three stakeholders' perspectives:- the project PIs
- are happy if their project data gets archived
- OpenAtlas
- always store full SQL dump; OpenAtlas allows 'restoring/loading' from an existing dump; user data is removed prior to dumping and other non-essential data is stripped
- Data model: CIDOC CRM, so some serialised re-usable file would also make sense (preferably ttl)
- + files: complicate things; files are part of the model as 'document'
- ARCHE
- Long-term preservation of digital (research) data to keep them findable and re-usable for as long as possible. ARCHE (or any other digital archive) intends to be a place where data can still be found, even if the their original hosting means is not available any longer. This requires the data collections to be self-contained in the sense that a user can still get out some meaning and information of the bunch of files presented in ARCHE without having to rely on external documentation or long-defunct software. This does not necessarily mean the full functionality of the original data context hat to be kept.
- What should/has to be kept from the data stored in and presented via OpenAtlas?
- Is there any additional material the project partners want to preserve?
- Is the archiving intended to be able to rebuild an OpenAtlas project?
- Should the preserved data also be aggregated by e.g. Kulturpool/Europeana?
Metadata requirements¶
- For a data collection to be properly represented in ARCHE we need metadata for which we have an OWL ontology, the ARCHE schema
- The collection as a whole is an instance of the class acdh:TopCollection and has mandatory and recommended metadata properties. As this is the entry point into the data collection, we encourage and recommend to provide as much information as possible to aid others in understanding what they are faced with
- Individual files represent an instance of the class acdh:Resource or, in some cases, acdh:Metadata. They come with a set of mandatory metadata properties, of which the following have to be provided by the project resp. by OpenAtlas
- acdh:hasTitle - title of entity, can be the file name (see below) or something else, multiple languages possible and alternative titles can be included as well with acdh:hasAlternativeTitle
- acdh:hasIdentifier - includes ARCHE IDs, as well as any external IDs in the form of an URI
- acdh:hasCategory - values come from https://vocabs.acdh.oeaw.ac.at/arche_category/en/
- acdh:hasMetadataCreator - those responsible for entering the metadata, can be multiple
- acdh:hasOwner -
- acdh:hasRightsHolder -
- acdh:hasLicensor -
- acdh:hasLicense -
- acdh:isPartOf - except for instances of acdh:TopCollection, every other entity in ARCHE is required to be part of a (top)collection (see section how to structure files)
- acdh:hasDepositor - as designated in deposition agreement
Note: There are more properties of which many are also recommended but this is to be discussed
File requirements¶
- File formats should adhere to ARCHE preferred and recommended formats
Q: Why not .glb? -> Answer: .glb is ok also
How to deal with filenames¶
- In OpenAtlas files are stored with integer + extension, e.g. 123.png
- We can keep this for ARCHE and use that for the ID
- Every file also has a title (free text, so it contain special characters) but how descriptive they are depends on the project partners
- Titles can be mapped to acdh:hasTitle or acdh:hasAlternativeTitle (multiple languages possible)
How to structure files¶
- It might be possible to categorize files into "sub folders" but this would be additional effort which has to be done for every project individually
- One question would be what the advantage would be to have them categorized and if it would be worth the effort.
- Depends on the project & amount of files
- Maybe case study? Not used uniformly -> No, because problematic on many levels, e.g. not every project has this, case study can be used multiple times, ...
- Structure by ARCHE resource category (by data type)
Mapping OpenAtlas file metadata to ARCHE metadata¶
- If files in OpenAtlas come with the same set of properties to be filled out we maybe can establish a mapping to be re-used across all OpenAtlas projects
Discussion¶
- Data model specific information can be added via acdh:hasDescription
- ARCHE also accepts and likes things with URIs from authority files (e.g Persons, Places)
- task: map OpenAtlas record information to ARCHE metadata schema
- task: new feature for OpenAtlas to collect information for acdh:TopCollection
- task: Attach spreadsheet with properties to this agenda ✅
- Include some kind of description to provide something like a manual for re-using the OpenAtlas data collection
Feedback related to bITEM¶
Files¶
- The files could be structured into folders based on the 'Fallstudien' (five present)
- Files come from various sources, some in copyright without an open license -> a problem for archiving
- Image quality/resolution is poor in many cases, but then some of these are derivatives from some external source, e.g. Wikimedia Commons
- 3D file resolution is low, but most seem to be down scaled models stored elsewhere, e.g. NHM data repository (Dodo) and sketchfab
- File formats: strange choices, e.g. png for something that is a scan or a photo (e.g. 235004.png)
Metadata¶
- Metadata should include the sources of the files to link to something in a higher resolution
Missing¶
On the presentation page the files/objects are presented with additional informative text
Where will this be included?
Technical¶
- Progress update about functionality to prepare data for archiving, see #2372 and related issues
- How should the subject URIs look like?
Add acdh:TopCollection (How and what exactly?)--> see metadata requirements above- Blank subjects
Related issues: #2372, #2466, #2467, #2551
Action Points¶
- Once the deposition agreement is finished, we will take of the top collection
- Bernhard provides file metadata via a script (incl. additional information, like depicted/related actors, in description) and will provide these to the ARCHE team for feedback (#2466)
- Think about structure files -> one approach we will try, is structure them by file type (e.g. image, 3d model, ...)
- Create a file catalog about what information is available/possible and give to ARCHE team for further evaluation
- If needed/advantageous we can provide read only accounts for the ARCHE team at respective OpenAtlas instances
- Next meeting will be 22. July 12:00