Text annotation

Issue #2079

Text annotation has been on our radar for some time now so I created an issues to discuss how we can proceed.
Basically it's about linking entities (actors, places, ...) to specific parts of a text, instead of just linking them to the whole text like it is possible currently.

Scope (for the first version)

  • It would be used at Source (E33) and their translations
  • Only already linked entities are offered
  • No overlapping annotations

User interface

We need a tool for users to annotate. Although difficult, text changes and annotations should be doable in one form element.


Information storing

We will save the information in an extra database table. Draft for fields:

  • id (int, required) generic internal database identifier
  • source_id (int, required)
  • entity_id - (int, required) the entity is linked in the annotation, e.g. actor, place, artifact, ...
  • link_start (int, required)
  • link_end - (int, required) or maybe the length?
  • user_id - (int, not required) to track who added it
  • text - (text, not required) a kind of description field for text information, maybe think about the name, e.g. annotation, description, text, ...
  • We can guarantee that there are no orphaned links
  • It can than be used to e.g. present as HTML, TEI, Web Annotation Data Model, ...

Topics to discuss

  • Possible related: annotation for images
  • How to deal with text changes in already annotated text

Time frame

In case we implement it in the course of a cooperation with ENCHANT the time frame will be for 2024: a working basic implementation in summer and a more complete version at the end of the year.

Ideas for future versions

  • Offer links to external reference systems (e.g. GeoNames) which we would use to create new entities on the fly (using available meta information, creating links to reference system, ...). Interesting but a lot of work.
  • A tool to find possible annotation candidates providing a result list to annotate multiple occurrences in one go

Updated by Alexander Watzinger 5 months ago · 11 revisions

Also available in: PDF HTML TXT