-
Notifications
You must be signed in to change notification settings - Fork 536
Description
This issue is to finish the investigation started in #8097.
Will copy-and-paste relevant experimental data/discussion from the corresponding PR #8152.
The short version of it is that it takes about 6 minutes to directly index a prod. dataset with 25K files ("directly" = via /api/admin/index/dataset), but the time goes up to 6 hours for the same dataset during an async. reindex (via /api/admin/index or /api/admin/index/continue). The difference between the 2 scenarios appears to have to do with where the dataset entity is instantiated in relation to the main transaction. (this is all explained in more details in the comments from #8152 below). This must have some rational explanation, related to how the transaction context is managed by the EJB. There's a good chance the same issue is affecting the performance elsewhere in the code when we have to modify datasets with similar numbers of files.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status