Remote model service option, and frontend fixes for overlapping, multiple annos per spans#264
Remote model service option, and frontend fixes for overlapping, multiple annos per spans#264tomolopolis merged 3 commits intomainfrom
Conversation
… for more than one annotation
…running inference on remote models for prepare_doc. Online learning not yet supported for this project setup type
There was a problem hiding this comment.
This PR is being reviewed by Cursor Bugbot
Details
You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
| self.start_char_index = entity_data.get('start', 0) | ||
| self.end_char_index = entity_data.get('end', 0) | ||
| self.text = entity_data.get('detected_name') or entity_data.get('source_value', '') | ||
| self.context_similarity = entity_data.get('context_similarity', entity_data.get('acc', 0.0)) |
There was a problem hiding this comment.
Bug: Missing similarity score marks all annotations as deleted
When using the remote model service, RemoteEntity.context_similarity defaults to 0.0 if the API response doesn't include context_similarity or acc fields. Since the hardcoded threshold is 0.3, any entity without these fields will have ent.context_similarity < MIN_ACC evaluate to true, causing the annotation to be marked as deleted=True and validated=True. This silently marks all such annotations as deleted and validated (appearing as human-reviewed) when they should be presented for manual review.
Additional Locations (1)
There was a problem hiding this comment.
intentional, remote model service models should set this
| raise ValidationError('Must set a DeID ModelPack for De-ID Model Annotation, cannot only set a cdb / vocab pair as' | ||
| ' not be a DeId model') | ||
| elif self.use_model_service and not self.model_service_url: | ||
| raise ValidationError('When using model service, model_service_url must be set') |
There was a problem hiding this comment.
Bug: DeID flag silently ignored for remote model service
When use_model_service is True, the deid_model_annotation flag is completely ignored. The model validation skips the DeID check when using remote service, allowing users to enable both options. However, the document processing code only checks deid_model_annotation in the local model path, meaning DeID processing never occurs for remote service projects even when the flag is set. Users who configure a project expecting De-identification processing will silently get standard processing instead.
Additional Locations (2)
mart-r
left a comment
There was a problem hiding this comment.
Some minor comments and questions, perhaps a few nice to haves. But overall looks alright to me.
| 'if a model pack is used for the project') | ||
| relations = models.ManyToManyField('Relation', blank=True, default=None, | ||
| help_text='Relations that will be available for this project') | ||
| use_model_service = models.BooleanField(default=False, |
There was a problem hiding this comment.
Is there a need for both the boolean and the URL? Couoldn't we just assume that if there's a URL, it needs to be used?
There was a problem hiding this comment.
agree - but in prep for when the admin - project setup is re-done, having a checkbox and some validation on a URL or some auto-generated URL etc. easier to keep explicit for now
| """ | ||
| spacy_doc.linked_ents.sort(key=lambda x: len(x.text), reverse=True) | ||
|
|
||
| tkns_in = [] |
There was a problem hiding this comment.
This is no longer used. I suppose that's because we now DO allow entities that cover the same tokens.
There was a problem hiding this comment.
not sure what you mean by its not used? its used views.py and utils.py
There was a problem hiding this comment.
The variable defined on this line is not used anywhere else. It used to be used. But with the removal of code lower down (when iterating over the entities), this is no longer ever used.
| logger.info('Using remote model service in bg process for project: %s', project.id) | ||
| filters = SimpleFilters(cuis=cuis) | ||
| for doc in docs: | ||
| logger.info(f'Running remote MedCAT service for project {project.id}:{project.name} over doc: {doc.id}') |
There was a problem hiding this comment.
The other logs above have lazy formatting, this one uses an f-string. Perhaps keep things lazy?
| except FileNotFoundError: | ||
| logger.warning('Missing CUI filter file for project %s', project.id) | ||
|
|
||
| if project.use_model_service: |
There was a problem hiding this comment.
The only 2 differences between these 2 if clauses seem to be:
- Getting the
spacy_doc - Passing different variables to
add_annotations
Perhaps we can avoid some of the code duplication by:
- Using a
functools.partialonadd_annotationsto pass required / changed kwargs - Using a callable for getting the
spacy_doc - Change the logged message to be dynamic
- Running everything else just once
- While using the partial and the callable
Just a nice to have, really.
| cat.config.components.linking.filters.cuis = cuis | ||
|
|
||
| for doc in docs: | ||
| logger.info(f'Running MedCAT model for project {project.id}:{project.name} over doc: {doc.id}') |
There was a problem hiding this comment.
Same comment here regarding f-strings vs lazy log record formatting.
|
|
||
| if not project.deid_model_annotation: | ||
| spacy_doc = cat(document.text) | ||
| if project.use_model_service: |
There was a problem hiding this comment.
Same comment here regarding potential use of partials and callables. Though admittedly a lot less actual duplication here.
| def _submit_document(project: ProjectAnnotateEntities, document: Document): | ||
| if project.train_model_on_submit: | ||
| if project.train_model_on_submit and not project.use_model_service: | ||
| # interim model training not supported for remote model service projects |
There was a problem hiding this comment.
Right now it's a silent failure. Is that expected?
There was a problem hiding this comment.
logger warning and a TODO
Note
Adds remote MedCAT service processing (with URL/config validation) and updates the UI to handle overlapping annotations via badges/popovers, alongside backend support and minor API adjustments.
use_model_serviceandmodel_service_urltoProjectAnnotateEntitiesandProjectGroupwith validation; bypass local model requirements when enabled.call_remote_model_service(requests) and lightweightRemoteSpacyDoc/RemoteEntitywrappers.prep_docs,prepare_documents,add_annotations) to support remote/local paths and CUI filters; allow overlapping manual annotations; remove CAT dependency fromcreate_annotation._submit_document,add_conceptaccordingly._PROJECT_ANNO_ENTS_SETTINGS_FIELD_ORDER).0093_add_remote_model_service_fieldsadds the new fields._common.scss.requests.Written by Cursor Bugbot for commit 378634b. This will update automatically on new commits. Configure here.