Feature #1848
closedAPI: fetch data from ARCHE
Description
For the INDIGO project, we will import data from ARCHE to OpenAtlas.
Update
Base functionality is implemented. It is possible to fetch metadata from ARCHE and import it to OpenAtlas. The feature is still experimental and will be further expanded (e.g. Creation event to track photographers) and generalized in the near future.
CIDOC mapping (in progress)¶
More information available at ARCHE import
INDIGO test collection (ARCHE)¶
A test collection with data provided by Geert Verhoeven and Benjamin Wild was imported into the staging instance of ARCHE, hosted on Minerva. The test collection has identifier https://id.acdh.oeaw.ac.at/indigo_test, which automatically resolves to the page with the details of the collection on ARCHE staging (i.e., https://arche-curation.acdh-dev.oeaw.ac.at/browser/oeaw_detail/1390136).
Collection arrangement¶
The main collection INDIGO Test Collection is what is called in ARCHE a Top Collection, the main folder that contains all the data related to the collection: https://id.acdh.oeaw.ac.at/indigo_test.
This contains two Collections, i.e. two "sub-folders", which correspond to the two batches of data sent by Benjamin and Geert:- Large Ortophotos (https://id.acdh.oeaw.ac.at/indigo_test/large_orthos)
Four large TIFF files sent by Benjamin Wild, with sizes ranging from 270 MB to 4 GB. - Test Photos (https://id.acdh.oeaw.ac.at/indigo_test/test_photos)
Eight test photos sent by Geert Verhoeven, including two color checkers, in different formats and with accompanying metadata files.
Each file contained in these Collections is called a Resource in the ARCHE ontology.
Test Photos¶
I would suggest to start working with the Test Photos collection, since it is now the most complete with different formats and metadata.
More precisely, each picture was provided by Geert in both JPG and NEF format (Nikon proprietary RAW format) and is accompanied by an XMP sidecar file, containing metadata to the picture. More information about the different metadata formats can be found in Geert's info document.
In addition, each picture was processed by means of ExifTool. All the metadata contained in the JPG file, NEF file, and XMP file were combined into one single JSON file, where each line contains a specific property with a tag identifying its metadata schema. For example: "IPTC:Sub-location": "Donaukanal"
. These metadata files are identified by the suffix _metadata
. When specific metadata properties coming from different files (JPG, NEF, XMP) did not have the same value in each of the sources, they were moved to a different metadata file, identified by the suffix _not_unique_values
.
Each of these metadata files is of class Metadata in the ARCHE ontology, and it is linked to the original file through property acdh:isMetadataFor
. You can see the relationship in the GUI too, by viewing the Details page of a metadata file (e.g., https://arche-curation.acdh-dev.oeaw.ac.at/browser/oeaw_detail/1390181):
Otherwise, if you view the Details of the original file (e.g., https://arche-curation.acdh-dev.oeaw.ac.at/browser/oeaw_detail/1390166), you can find the info by switching to the Expert-View (which is in general very useful for viewing more metadata about a resource):
and then scrolling to the Inverse Data section:
Therefore, givenINDIGO_2022-07-22_Z7II-A_0007
as name of one picture, in the Test Photos collection you can find five different resources about this picture:
INDIGO_2022-07-22_Z7II-A_0007.jpg
INDIGO_2022-07-22_Z7II-A_0007.nef
INDIGO_2022-07-22_Z7II-A_0007.xmp
INDIGO_2022-07-22_Z7II-A_0007_metadata.json
INDIGO_2022-07-22_Z7II-A_0007_not_unique_values.json
Files