Project

General

Profile

Actions

Feature #2568

closed

Admin interface for generating ARCHE dumps

Added by Bernhard Koschiček-Krombholz 6 months ago. Updated 3 months ago.

Status:
Closed
Priority:
Normal
Category:
Backend
Target version:
Start date:
2025-06-24
Estimated time:

Description

Create an admin interface with the magical one click button to create an all-inclusive ARCHE dump.

This dump should include:
  • Files sorted by extension ✅
  • Metadata for files in a ttl (#2466) ✅
  • Lists of failed files due to ARCHE restrictions (no license, no creator, no license holder, etc.) ✅
  • Stripped SQL dump ✅ (currently, all data is in there)
  • RDF dump (#2551) ✅

One major issue is, how the ARCHE metadata is stored/transferred into OpenAtlas. Currently, the data is stored in the production.py, which is not very handy to handle.

Done:
  • change folder structure into data, metadata, and debug
  • add statistic to debug (how large, how many files, how many folders, etc.) ✅
  • check for duplicate files via hashes ✅
  • add the reference as named entity of class acdh:Publication and link it with acdh:isSourceOf ✅
  • convert URL to ASCII ✅
  • Enrich the description of files ✅
    • if linked entity has ext ref system URL, check with arche_assets, if correct, then add a link to a new entity (only for Actors and Places)✅
    • if no ext ref system, then just add a named entity ✅
  • Add file checker ✅ --> functionality will be used in #2580
  • create API endpoint for ARCHE metadata ✅
  • write manual entry (what is needed, where and who can enter metadata, who can export, which file checkers are there) ✅
Todo:
  • SQL dump with only the project data -> #2613

Files

TopCollection.xlsx (202 KB) TopCollection.xlsx Bernhard Koschiček-Krombholz, 2025-07-21 09:04

Related issues 4 (1 open3 closed)

Related to OpenAtlas - Feature #2551: Admin interface for generating RDF dumpsClosedBernhard Koschiček-Krombholz2025-02-04Actions
Related to OpenAtlas - Feature #2466: API: Export files with ARCHE RDF metadataClosedBernhard Koschiček-Krombholz2025-02-03Actions
Related to OpenAtlas - Feature #2580: Report generation for ARCHE import issuesClosedBernhard Koschiček-Krombholz2025-07-22Actions
Related to OpenAtlas - Feature #2613: ARCHE export: SQL dumpAssignedBernhard Koschiček-Krombholz2025-09-03Actions
Actions #1

Updated by Bernhard Koschiček-Krombholz 6 months ago

  • Related to Feature #2551: Admin interface for generating RDF dumps added
  • Related to Feature #2466: API: Export files with ARCHE RDF metadata added
Actions #2

Updated by Bernhard Koschiček-Krombholz 6 months ago

  • Description updated (diff)
Actions #3

Updated by Bernhard Koschiček-Krombholz 6 months ago

The view is completed. Thing to discuss:

  • Where to store the ARCHE relevant metadata
    • stored in config.py/production.py (current solution)
    • upload a json/toml/yml with the metadata, which is discarded at the end of the process
    • store it in the database
    • store different configuration in files/ with json/toml/yml
    • ...
Actions #4

Updated by Bernhard Koschiček-Krombholz 6 months ago

  • Status changed from In Progress to Resolved
Actions #5

Updated by Bernhard Koschiček-Krombholz 5 months ago

There is a CSV for the topCollection. Maybe use this instead of the config variables.

Actions #6

Updated by Bernhard Koschiček-Krombholz 5 months ago

  • Status changed from Resolved to In Progress

Put it back to in progress, there are still some things to do:

  • topCollection information as csv will be included manually
  • change folder structure into data, metadata, and debug
  • maybe add statistic to debug (how large, how many files, how many folders, etc.) -> also used for #2580
  • maybe check for duplicate files via hashes (but ARCHE will do the same with their file checker)
  • SQL dump with only the project data
  • if there is a reference, add the reference as named entity of class acdh:Publication and link it to media file with acdh:isSourceOf
  • Enrich the description of files
Actions #7

Updated by Bernhard Koschiček-Krombholz 5 months ago

  • Description updated (diff)
Actions #8

Updated by Bernhard Koschiček-Krombholz 5 months ago

  • Description updated (diff)

Again new todos:

  • convert URL to ASCII
  • Add file checker
    • No license
    • No creator
    • No license holder
    • Maybe duplicated file hash
  • write manual entry (what is needed, where and who can enter metadata, who can export, which file checkers are there)
Actions #9

Updated by Bernhard Koschiček-Krombholz 5 months ago

  • Description updated (diff)
Actions #10

Updated by Bernhard Koschiček-Krombholz 5 months ago

  • Description updated (diff)
Actions #11

Updated by Bernhard Koschiček-Krombholz 5 months ago

  • Description updated (diff)
Actions #12

Updated by Bernhard Koschiček-Krombholz 5 months ago

  • Description updated (diff)
Actions #13

Updated by Bernhard Koschiček-Krombholz 5 months ago

  • Description updated (diff)
Actions #14

Updated by Bernhard Koschiček-Krombholz 5 months ago

  • Description updated (diff)
Actions #15

Updated by Bernhard Koschiček-Krombholz 5 months ago

  • Description updated (diff)
Actions #16

Updated by Bernhard Koschiček-Krombholz 5 months ago

  • Description updated (diff)
Actions #17

Updated by Bernhard Koschiček-Krombholz 5 months ago

  • Description updated (diff)
Actions #18

Updated by Bernhard Koschiček-Krombholz 5 months ago

  • Related to Feature #2580: Report generation for ARCHE import issues added
Actions #19

Updated by Bernhard Koschiček-Krombholz 4 months ago

  • Description updated (diff)
  • Status changed from In Progress to Resolved
Actions #20

Updated by Bernhard Koschiček-Krombholz 4 months ago

Actions #21

Updated by Bernhard Koschiček-Krombholz 3 months ago

  • Status changed from Resolved to Closed
Actions

Also available in: Atom PDF