Feature #1617
closed
CONNEC: Transform sub events links to source links
Added by Alexander Watzinger about 3 years ago.
Updated almost 3 years ago.
Description
In the course of the presentation site development we discovered relations between events and sub events in CONNEC that needs fixing to a "this letter mentioned another letter" relation.
Because they are too many to solve manually we will think about an automatic solution.
Test link provide by Becca: https://connec.acdh.oeaw.ac.at/entity/13064
Files
- Description updated (diff)
- Status changed from Assigned to In Progress
I took a look at it, there are 623 links to sub events so fixing them manually won't be an option.
After giving it some thought I think it would be best to:
- Link the artifact of the sub event to the source of the artifact of the super event
- Delete the sub event link
In the model this would than mean that in the source (of the artifact of the super event) other letters (the artifacts of the sub event) are mentioned.
I will have to dig deeper to see if all prerequisites are full filled, e.g. every event has an artifact, there are no sub events of sub events.
If everything looks ok it could be done with an SQL statement, otherwise we have to think about writing a Python script because this won't be doable just with plain SQL.
I'll keep you updated.
It took some time but I managed to come up with a pure SQL solution. We can look together into it at the next meeting.
This script can be used by an admin (e.g. me) at admin/SQL execute. The xxx have to be replaced (3 times) with the id of the "super" event and will than replace sub events links with links from the source of the super event to artifacts of the sub events.
It has it's limitation, e.g. that
- We have to do one super event at a time
- The super event must have sub events
- The super event is connected to an (moved) artifact which is connected to a source
- The sub events are connected to an (moved) artifact
In practice we can just try it out. Because code submitted in the backend SQL interface is in a transaction it will do everything (if it works ok) or nothing.
INSERT INTO model.link (domain_id, property_code, range_id)
SELECT domain.range_id, 'P67', range.range_id FROM
(SELECT event_artifact.range_id
FROM model.link sub_super
JOIN model.link event_artifact
ON sub_super.domain_id = event_artifact.domain_id
AND event_artifact.property_code = 'P25'
AND sub_super.property_code = 'P117'
WHERE sub_super.range_id = xxx) domain
,
(SELECT artifact_source.range_id
FROM model.link artifact_source
JOIN model.link event_artifact
ON artifact_source.domain_id = event_artifact.range_id
AND event_artifact.property_code = 'P25'
AND artifact_source.property_code = 'P128'
WHERE event_artifact.domain_id = xxx) range;
DELETE FROM model.link WHERE property_code = 'P117' AND range_id = xxx;
After showing how it works in today's meeting with decided:
- We will run the script on the test server so that we can take a look at the changes without risking unwanted changes in the productive system
- In cases where transformation isn't possible the "is sub event of" link will be kept and a list of them will be sent to check them
- Once we are satisfied with the result we can run the script on the productive data
I wrote a script which transforms super events where possible but leaves the one that couldn't be transformed.
The transformed database was uploaded to the test server and a list of the problematic ones (233 of 627) was sent to the CONNEC team.
Becca looked into the provided test instance and the script seems to work out ok.
But because of problems with different PostgreSQL server versions we decided to wait for the server move before running the script on the productive data.
Closing this issue because script was already executed for CONNEC data on new server.
- Status changed from In Progress to Closed
Also available in: Atom
PDF