Project

General

Profile

Actions

Feature #1997

closed

Performance: improve type views

Added by Stefan Eichert about 1 year ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Category:
Backend
Target version:
Start date:
2023-03-29
Estimated time:

Description

When opening/editing/saving e.g. type entities, that have many connected other entities, the performance is very poor. It takes up to one minute or even more to load the respective page. E.g. https://thanados.openatlas.eu/entity/25


Related issues 1 (0 open1 closed)

Related to OpenAtlas - Feature #1752: Add a place column for subunits to type viewClosedNina Richards2022-06-21Actions
Actions #1

Updated by Alexander Watzinger about 1 year ago

  • Tracker changed from Bug to Feature
  • Subject changed from Performance Issues to Performance: improve type views
  • Status changed from New to In Progress
  • Assignee set to Alexander Watzinger
  • Target version set to 7.12.0
  • Found in version deleted (7.12.0)

I can confirm this issue and will look into it.
Because THANADOS is using the develop branch I tested it with the main branch but it doesn't solve the issue so we can't just switch to the main branch as a quick fix.
I changed the tracker of this issue from bug to feature because technically it is not an error but a combined effect of adding features and THANADOS adding data. Performance improvement is an ongoing task where we are reacting (and depending) on reports of projects using OpenAtlas.

Nevertheless sorry for the inconvenience and thank you for reporting. We try to find a solution as fast as possible.

Actions #2

Updated by Alexander Watzinger about 1 year ago

Like suspected the reason was a combination of a feature and growing data. Because of #1752 (Add a place column for subunits to type view) and a lot of linked entities the view was slow because looking up the "root" place for every entry takes a while.
As a quick fix I disabled this feature and it's much faster now. I also switched THANADOS to the release_candidate branch which is safer for now.

I understand that the place column is very handy so I will look into it. But I'm not sure if I manage to solve the related performance issues before the next release in the upcoming weekend.
Worst case scenario is that we release without the #1752 feature which is still better than to have an unusable slow system.

I will keep you updated.

Actions #3

Updated by Alexander Watzinger about 1 year ago

  • Related to Feature #1752: Add a place column for subunits to type view added
Actions #4

Updated by Stefan Eichert about 1 year ago

Thank you for the quick workaround!

Actions #5

Updated by Alexander Watzinger about 1 year ago

  • Target version changed from 7.12.0 to 7.15.0

I looked with Bernhard into this issues. It seems if me manage to get all "root" places for every entity in one go with a recursive function, it would still be fast enough.
While implementing we ran into some SQL troubles when trying a recursive function that only returns the "top" place which should be possible in theory, see: https://stackoverflow.com/questions/66281034/postgresql-select-only-last-row-from-each-recursion

Sadly this issue turned up too close to the release this weekend. So for now we have to leave the function to display root places at type views deactivated but will look into it before the next release.

Actions #6

Updated by Stefan Eichert about 1 year ago

If you could post details on the desired result and the data sources (tables, fields etc.) I will design the recursive query.

Actions #7

Updated by Alexander Watzinger about 1 year ago

Thanks Stefan, this would be great. So we are talking about the model.link table, where we can look up the P46 links to find the "top" place of an entry.
What we have so far is something like this:
WITH RECURSIVE items AS (
    SELECT range_id, domain_id
    FROM model.link
    WHERE range_id IN (177676, 145352) AND property_code = 'P46'
    UNION
    SELECT l.range_id, l.domain_id FROM model.link l
    INNER JOIN items i ON
    l.range_id = i.domain_id
    AND l.property_code = 'P46'
) SELECT range_id, domain_id FROM items;

In this example we only use the id of an artifact (177676) and a human remains (145352) from the THANADOS data set which will than be replaced with all the ids from the table of connected entities at a type view.
The result would look like this:
range_id domain_id
177676 176874
176874 176734
176734 176118
145352 81239
81239 63533
63533 47361
Which shows all connections leading to the top P46 hierarchy (the place). But what we would need is an output that shows only the id of e.g. the artifact and the corresponding place_id (the bold entries). So that it looks like this:
entity_id place_id
177676 176118
145352 47361

This should be possible, we looked at an working example given here: https://thanados.openatlas.eu/entity/25. But table structure is different and we didn't manage to adapt it to work for our case too.
Help with that would be appreciated. Feel free to ask if more information is needed.
Bonus points if the result also contains the place name, than we could use that to build a link :)

Actions #8

Updated by Stefan Eichert about 1 year ago

This should do the job:

WITH RECURSIVE parent_tree AS (
  SELECT p.parent_id, p.child_id, ARRAY [p.child_id] AS path, 1 AS depth
  FROM (SELECT domain_id as parent_id, range_id as child_id FROM model.link WHERE property_code = 'P46') p
  WHERE p.child_id IN (177676, 145352, 81239, 176118)
  UNION ALL
  SELECT t.parent_id, t.child_id, pt.path || ARRAY [t.child_id], pt.depth + 1
  FROM (SELECT domain_id as parent_id, range_id as child_id FROM model.link WHERE property_code = 'P46') t
  JOIN parent_tree pt ON pt.parent_id = t.child_id
), 
root_nodes AS (
  SELECT DISTINCT ON (path[1]) path[1] AS child_id, parent_id AS top_level
  FROM parent_tree
  WHERE parent_id IS NOT NULL
  ORDER BY path[1], depth DESC
)
SELECT DISTINCT a.child_id AS start_node, r.top_level, e.name
FROM root_nodes r
JOIN parent_tree a ON a.child_id = r.child_id
JOIN model.entity e ON e.id = r.top_level
ORDER BY a.child_id

It gets the top level parent of a respective starting point entity and the name of that top level entity.

The array of child_ids, in this example "(177676, 145352, 81239, 176118)" should be modified to your needs as well as the property code.

Also if the property is inverse e.g. if the domain_id does not mark the parent but the range_id (e.g. for P127) then exchange domain_id and range_id in the subquery:

WITH RECURSIVE parent_tree AS (
  SELECT p.parent_id, p.child_id, ARRAY [p.child_id] AS path, 1 AS depth
  FROM (SELECT range_id as parent_id, domain_id as child_id FROM model.link WHERE property_code = 'P127') p
  WHERE p.child_id IN (26469, 26467)
  UNION ALL
  SELECT t.parent_id, t.child_id, pt.path || ARRAY [t.child_id], pt.depth + 1
  FROM (SELECT range_id as parent_id, domain_id as child_id FROM model.link WHERE property_code = 'P127') t
  JOIN parent_tree pt ON pt.parent_id = t.child_id
), 
root_nodes AS (
  SELECT DISTINCT ON (path[1]) path[1] AS child_id, parent_id AS top_level
  FROM parent_tree
  WHERE parent_id IS NOT NULL
  ORDER BY path[1], depth DESC
)
SELECT DISTINCT a.child_id AS start_node, r.top_level, e.name
FROM root_nodes r
JOIN parent_tree a ON a.child_id = r.child_id
JOIN model.entity e ON e.id = r.top_level
ORDER BY a.child_id

Actions #9

Updated by Alexander Watzinger about 1 year ago

  • Status changed from In Progress to Closed
  • Target version changed from 7.15.0 to 7.12.0

Thanks a lot Stefan, this is great and also cool that you got it so quick so that we can add it to the release.
It's already in branch release_candidate and also at THANADOS, so https://thanados.openatlas.eu/entity/25 is much faster now.

I implemented it in a way that you can pass a property, but will also implement "inverse" direction and the like when we have a need for it.

Actions #10

Updated by Alexander Watzinger about 1 year ago

I decided to make the function more generalized now when already at it.
So now we have the function get_roots() at model/entity which takes a property code, a list of ids and an inverse flag as parameters.
It returns a dict with the provided ids as keys and a dict with the root id and root name (needed to build links) as values. For root places it would look like this.

Entity.get_roots('P46', [123, 128], inverse=True)
{
   123: {'id': 5, 'name': 'first_root_name'},
   128: {'id': 8, 'name': 'second_root_name'}
}

Thanks again Stefan for support with the SQL statement, it works great!

Actions

Also available in: Atom PDF