Skip to content

feat(exports): add machine prediction, verification, and detection fields#1214

Open
mihow wants to merge 2 commits intomainfrom
feat/export-machine-prediction-fields
Open

feat(exports): add machine prediction, verification, and detection fields#1214
mihow wants to merge 2 commits intomainfrom
feat/export-machine-prediction-fields

Conversation

@mihow
Copy link
Copy Markdown
Collaborator

@mihow mihow commented Apr 8, 2026

Summary

Separates machine predictions from human identifications in exports and API, so researchers see both side-by-side. Previously the determination was overwritten when a human verified, losing the original ML prediction.

  • Extracted find_best_prediction() and find_best_identification() from update_occurrence_determination() for reuse by exports and API
  • Set determination_score to None for human-determined occurrences (ML confidence preserved in best_machine_prediction_score)
  • Added new CSV export fields and best_machine_prediction nested object to the API
  • Addressed all CodeRabbit review feedback: N+1 query fixes, distinct=True on counts, PII removal, withdrawn ID filtering, timezone-safe tests

Closes #1213

New CSV export columns

Column Example value Description
best_machine_prediction_name Idia aemula Top ML prediction taxon
best_machine_prediction_algorithm moth-classifier-v2 Algorithm that produced the prediction
best_machine_prediction_score 0.881 ML confidence score
verified_by Jane Smith Name of the most recent human identifier
verified_by_count 2 Number of non-withdrawn identifications
agreed_with_algorithm moth-classifier-v2 Algorithm name if the human explicitly agreed
determination_matches_machine_prediction True Does the human ID match the ML prediction?
best_detection_bbox [0.1, 0.1, 0.5, 0.5] Bounding box coordinates
best_detection_source_image_url https://s3.../image.jpg Public URL to the original capture
best_detection_occurrence_url https://app.../sessions/123?capture=456&occurrence=789 Platform UI link

Example CSV rows

ML prediction only (no human verification):

id,determination_name,determination_score,best_machine_prediction_name,best_machine_prediction_score,verified_by,verified_by_count,determination_matches_machine_prediction
993,Idia aemula,0.881,Idia aemula,0.881,,0,

Human agrees with ML:

id,determination_name,determination_score,best_machine_prediction_name,best_machine_prediction_score,verified_by,verified_by_count,agreed_with_algorithm,determination_matches_machine_prediction
993,Idia aemula,,Idia aemula,0.881,Jane Smith,1,moth-classifier-v2,True

Human disagrees with ML:

id,determination_name,determination_score,best_machine_prediction_name,best_machine_prediction_score,verified_by,verified_by_count,determination_matches_machine_prediction
993,Catocala relicta,,Idia aemula,0.881,Jane Smith,1,False

New API field: best_machine_prediction

Added to OccurrenceListSerializer — always populated regardless of verification status:

{
  "id": 993,
  "determination": {"id": 4205, "name": "Catocala relicta"},
  "determination_score": null,
  "best_machine_prediction": {
    "taxon": {"id": 4205, "name": "Idia aemula"},
    "algorithm": {"id": 3, "name": "moth-classifier-v2"},
    "score": 0.881,
    "determination_matches_machine_prediction": false
  },
  "identifications": [...]
}

Test plan

  • 7 new export field tests covering: ML-only, agreeing human, disagreeing human, multiple identifications, bbox, field presence, occurrence URL
  • All 16 export tests pass (2 deselected are pre-existing JSON exporter DisallowedHost failures)
  • All 11 occurrence-related main tests pass
  • All 47 ML tests pass
  • CI tests pass

Follow-up

  • Add stale determination data integrity check to ami/main/checks.py (see feat: add check_occurrences for occurrence data integrity #1188)
  • Create management command to backfill determination_score = null for human-determined occurrences
  • Tighten legacy backfill condition in Occurrence.save() to skip human-determined occurrences
  • Fix type mismatch in update_occurrence_determination()current_determination can be an ID or a Taxon object

🤖 Generated with Claude Code

…elds

Separate machine predictions from human identifications in exports and API,
so researchers see both side-by-side. Previously the determination was
overwritten when a human verified, losing the original ML prediction.

Model layer:
- Extract find_best_prediction() and find_best_identification() from
  update_occurrence_determination() for reuse by exports and API
- Set determination_score to None for human-determined occurrences
  (ML confidence preserved in best_machine_prediction_score)

New CSV export fields:
- best_machine_prediction_name, _algorithm, _score
- verified_by, verified_by_count, agreed_with_algorithm
- determination_matches_machine_prediction
- best_detection_bbox, best_detection_source_image_url,
  best_detection_occurrence_url

API changes:
- Add best_machine_prediction nested object to OccurrenceListSerializer
  (always populated regardless of verification status)

Closes #1213

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 8, 2026 02:07
@netlify
Copy link
Copy Markdown

netlify bot commented Apr 8, 2026

Deploy Preview for antenna-ssec canceled.

Name Link
🔨 Latest commit d6853f5
🔍 Latest deploy log https://app.netlify.com/projects/antenna-ssec/deploys/69d5bc06f6f0120008aedf8f

@netlify
Copy link
Copy Markdown

netlify bot commented Apr 8, 2026

Deploy Preview for antenna-preview canceled.

Name Link
🔨 Latest commit d6853f5
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/69d5bc06be6ccd00081122f7

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 8, 2026

📝 Walkthrough

Walkthrough

This PR separates machine predictions from human verifications in CSV exports and the API by introducing new annotated fields that expose ML confidence scores, verification metadata, and detection details. The implementation refactors determination computation to preserve original ML predictions and adds comprehensive query optimization annotations.

Changes

Cohort / File(s) Summary
Model Layer
ami/main/models.py
Refactored Occurrence to extract find_best_prediction() and find_best_identification() methods. Added queryset annotations with_best_machine_prediction() and with_verification_info() to efficiently compute ML prediction attributes (name, score, algorithm), verification details (verified_by, count, agreement status), and extended with_best_detection() with source image metadata.
API Serializer
ami/main/api/serializers.py
Extended OccurrenceListSerializer with nested best_machine_prediction field containing taxon, algorithm, score, and determination match comparison; retrieves data via model's find_best_prediction() method.
Export Serializer
ami/exports/format_types.py
Expanded OccurrenceTabularSerializer with 9 new CSV columns: machine prediction fields (best_machine_prediction_name, algorithm, score), verification fields (verified_by, verified_by_count, agreed_with_algorithm, determination_matches_machine_prediction), and detection fields (best_detection_bbox, source_image_url, occurrence_url). Updated get_verification_status() to prioritize verified_by_count annotations.
Export Tests
ami/exports/tests.py
Added ExportNewFieldsTest class with helper methods and 8 test cases validating ML-only exports, human verification scenarios, multiple identifications aggregation, detection metadata, and CSV header completeness.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant Serializer as OccurrenceTabularSerializer
    participant QuerySet as OccurrenceQuerySet
    participant Occurrence
    participant Annotations as Database Annotations
    
    Client->>Serializer: Export CSV request
    Serializer->>QuerySet: Build queryset with annotations
    QuerySet->>Annotations: .with_best_machine_prediction()
    QuerySet->>Annotations: .with_verification_info()
    QuerySet->>Annotations: .with_best_detection()
    Annotations-->>QuerySet: Annotated rows
    
    Serializer->>Occurrence: get_verified_by(obj)
    Serializer->>Occurrence: get_agreed_with_algorithm(obj)
    Serializer->>Occurrence: get_determination_matches_machine_prediction(obj)
    Occurrence-->>Serializer: Computed field values
    
    Serializer->>Serializer: get_best_detection_source_image_url()
    Serializer->>Serializer: get_best_detection_occurrence_url()
    
    Serializer-->>Client: CSV with prediction + verification + detection fields
Loading
sequenceDiagram
    actor Client
    participant API as OccurrenceListSerializer
    participant Occurrence as Occurrence.find_best_prediction()
    participant Classification as Classification (ML)
    participant Taxon
    participant Algorithm
    
    Client->>API: GET /occurrences/{id}/
    API->>Occurrence: Call find_best_prediction()
    Occurrence->>Classification: Query best (terminal first, highest score)
    Classification-->>Occurrence: Classification object
    Occurrence-->>API: Classification | None
    
    alt Prediction exists
        API->>Taxon: Serialize taxon from prediction
        API->>Algorithm: Lazy-import & serialize algorithm
        API->>API: Compare determination_id vs prediction.taxon_id
        API-->>Client: {taxon, algorithm, score, determination_matches_machine_prediction}
    else No prediction
        API-->>Client: null
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Poem

🐰 Predictions separate from checks,
Now humans and algorithms flex—
Side-by-side, both scores appear,
No overwrites, the picture's clear!
ML confidence preserved bright,
With verification insight. 🔍

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.38% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main change: adding machine prediction, verification, and detection fields to exports.
Linked Issues check ✅ Passed All coding requirements from issue #1213 are met: new export fields added, API serializer updated, model refactoring complete, and comprehensive tests included.
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #1213 objectives: export fields, API enhancement, model refactoring, and test coverage for new functionality.
Description check ✅ Passed The PR description provides comprehensive context: detailed summary, list of changes, related issue reference, explanation of how changes solve the problem, test results, and relevant notes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/export-machine-prediction-fields

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates occurrence exports and the API to preserve and expose both machine predictions and human verifications side-by-side, preventing ML predictions from being lost when a human identification is added.

Changes:

  • Refactors occurrence logic to expose reusable “best prediction” and “best identification” selection methods and adjusts determination score semantics for human IDs.
  • Extends CSV exports with new machine prediction, verification, and detection-related fields backed by queryset annotations.
  • Adds a best_machine_prediction nested object to the occurrences list API serializer.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

File Description
ami/main/models.py Adds queryset annotations for best detection/prediction/verification and refactors best prediction/identification selection + determination score update logic.
ami/main/api/serializers.py Adds best_machine_prediction field to occurrence list API responses.
ami/exports/format_types.py Adds new CSV export fields and wires exporter queryset to include new annotations.
ami/exports/tests.py Adds tests covering the new CSV export fields across ML-only and human verification scenarios.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 3206 to +3210
)
new_determination = None
new_score = None

top_identification = occurrence.best_identification
top_identification = occurrence.find_best_identification()
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update_occurrence_determination() mixes types for current_determination: when current_determination isn’t passed, it is loaded via .values("determination") (an integer id), but later compared against Taxon objects (top_identification.taxon != current_determination). This makes the comparisons always true and can cause unnecessary updates / incorrect branching (e.g., when called from Identification.save() which currently calls update_occurrence_determination(self.occurrence) without passing current_determination). Consider consistently comparing ids (e.g., load current_determination_id via values_list) or loading the actual Taxon object.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude says: Valid point about the type mismatch. This is a pre-existing issue in update_occurrence_determination() — the .values('determination') returns an ID while comparisons use Taxon objects. Filed as a follow-up since fixing it changes existing behavior beyond this PR's scope.

Comment on lines 3210 to 3214
top_identification = occurrence.find_best_identification()
if top_identification and top_identification.taxon and top_identification.taxon != current_determination:
new_determination = top_identification.taxon
new_score = top_identification.score
new_score = None # Human ID score is not meaningful for determination_score
elif not top_identification:
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a human identification exists, determination_score should be cleared even if the identification taxon matches the existing determination (common case: human verifies the same species). Currently new_determination / new_score are only set when top_identification.taxon != current_determination, so determination_score may remain as the previous ML confidence. Consider separating “score update” from “determination change”: if top_identification exists, set new_score = None (and update occurrence.determination_score) regardless of whether the determination changes.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude says: Fixed in d6853f5 — update_occurrence_determination now always sets new_score based on the current authority (human or ML), regardless of whether the taxon changed.

Comment on lines 3092 to +3098
def get_determination_score(self) -> float | None:
"""
Return the determination score for this occurrence.

Human identifications return None (score of 1.0 is meaningless).
Machine predictions return the classification confidence score.
"""
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_determination_score() now returns None for any occurrence with a non-withdrawn human identification. Be aware this interacts with the legacy backfill logic in Occurrence.save() (it recomputes and warns when determination_score is falsy); with the new semantics, human-determined occurrences can legitimately have determination_score=None, so the backfill/warning path may run repeatedly. Consider tightening the save-time backfill condition (e.g., only run for legacy ML-determined occurrences) to avoid noisy warnings and extra work.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude says: Good catch. The legacy backfill in save() will now run for human-determined occurrences where determination_score is legitimately None. This is a pre-existing concern — the backfill should be tightened to only apply to legacy ML-determined occurrences. Adding as a follow-up TODO.

verified_by_email=models.Subquery(best_identification_subquery.values("user__email")[:1]),
verified_by_count=models.Count(
"identifications",
filter=Q(identifications__withdrawn=False),
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verified_by_count is annotated with Count("identifications", filter=...) without distinct=True. This queryset is typically combined with other annotations that join additional relations (e.g. with_detections_count() joins detections), which can inflate the identification count due to join multiplication. Use distinct=True on the count (or restructure annotations) to keep verified_by_count accurate.

Suggested change
filter=Q(identifications__withdrawn=False),
filter=Q(identifications__withdrawn=False),
distinct=True,

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude says: Fixed in d6853f5 — added distinct=True to prevent join multiplication.

Comment on lines +156 to +160
"""Returns the name or email of the user who made the best identification."""
name = getattr(obj, "verified_by_name", None)
if name:
return name
return getattr(obj, "verified_by_email", None)
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with_verification_info() annotates verified_by_email and OccurrenceTabularSerializer.get_verified_by() falls back to exporting that email address. This likely exposes PII in exports; the codebase explicitly notes that user serializers exclude email for privacy (see ami/users/api/serializers.py:12). Consider exporting a non-PII identifier instead (e.g., display name only, or user id), or making the email fallback opt-in/admin-only.

Suggested change
"""Returns the name or email of the user who made the best identification."""
name = getattr(obj, "verified_by_name", None)
if name:
return name
return getattr(obj, "verified_by_email", None)
"""Returns the non-PII display name of the user who made the best identification."""
return getattr(obj, "verified_by_name", None)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude says: Fixed in d6853f5 — removed the email fallback entirely, now returns name only.

count = getattr(obj, "verified_by_count", None)
if count is not None:
return "Verified" if count > 0 else "Not verified"
return "Verified" if obj.identifications.exists() else "Not verified"
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_verification_status() documents “non-withdrawn identifications”, but the fallback path uses obj.identifications.exists() which includes withdrawn identifications. If this serializer is used without the verified_by_count annotation, it can incorrectly mark occurrences as Verified. Consider filtering identifications__withdrawn=False in the fallback check to match the method’s stated behavior.

Suggested change
return "Verified" if obj.identifications.exists() else "Not verified"
return (
"Verified"
if obj.identifications.filter(withdrawn=False).exists()
else "Not verified"
)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude says: Fixed in d6853f5 — fallback path now filters withdrawn=False.

Comment on lines +202 to +204
def get_best_detection_occurrence_url(self, obj):
"""Returns the platform UI link to the occurrence in context."""
return obj.context_url()
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

best_detection_occurrence_url currently delegates to obj.context_url(), which computes the URL by querying obj.best_detection (extra DB query per row) and embeds a hard-coded https://app.preview.insectai.org/... domain. Now that this value is exported for many occurrences, this can cause N+1 queries and environment-specific/incorrect links in production. Consider building the URL from already-annotated best-detection fields (or joining the needed event/capture ids in the export queryset) and sourcing the UI base URL from settings.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude says: Fixed in d6853f5 — occurrence URL is now built from annotated event_id and source_image_id fields, avoiding the N+1 query from context_url().

Comment on lines +1396 to +1401
def get_best_machine_prediction(self, obj: Occurrence):
"""Always return the best machine prediction, regardless of human verification status."""
context = self.context
context["occurrence"] = obj

prediction = obj.find_best_prediction()
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OccurrenceListSerializer.get_best_machine_prediction() calls obj.find_best_prediction() per occurrence, which will hit the database for each item in list views (the viewset queryset doesn’t prefetch/annotate predictions). This introduces an N+1 query pattern on the occurrences list endpoint. Consider annotating the best prediction in the queryset (similar to the export annotations) and/or prefetching the required relations so this method can avoid per-row queries.

Suggested change
def get_best_machine_prediction(self, obj: Occurrence):
"""Always return the best machine prediction, regardless of human verification status."""
context = self.context
context["occurrence"] = obj
prediction = obj.find_best_prediction()
def _get_best_machine_prediction_instance(self, obj: Occurrence):
"""
Resolve the best machine prediction from data already loaded on the occurrence.
This avoids issuing a per-row query from the serializer. Prefer queryset
annotations/cached attributes, then fall back to prefetched Classification
relations if they are available.
"""
for attr_name in (
"best_machine_prediction_instance",
"_best_machine_prediction",
"annotated_best_machine_prediction",
):
prediction = getattr(obj, attr_name, serializers.empty)
if prediction is not serializers.empty:
return prediction
prediction = getattr(obj, "best_prediction", serializers.empty)
if prediction is not serializers.empty:
return prediction
prefetched_objects_cache = getattr(obj, "_prefetched_objects_cache", {})
for field in obj._meta.get_fields():
if not getattr(field, "one_to_many", False):
continue
if getattr(field, "related_model", None) is not Classification:
continue
accessor_name = field.get_accessor_name()
if accessor_name not in prefetched_objects_cache:
continue
predictions = prefetched_objects_cache[accessor_name]
if not predictions:
return None
return max(
predictions,
key=lambda item: (
item.score is not None,
item.score if item.score is not None else float("-inf"),
item.pk or 0,
),
)
return None
def get_best_machine_prediction(self, obj: Occurrence):
"""Always return the best machine prediction, regardless of human verification status."""
context = self.context
context["occurrence"] = obj
prediction = self._get_best_machine_prediction_instance(obj)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude says: Fixed in d6853f5 — now uses obj.best_prediction (cached property) instead of find_best_prediction().

classification = detection.classifications.create(
taxon=taxon,
score=score,
timestamp=datetime.datetime.now(),
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test uses datetime.datetime.now() for a model timestamp. With USE_TZ=True this can create naive datetimes and warnings/errors depending on field settings. Prefer django.utils.timezone.now() (or reuse the source_image.timestamp) to keep timezone handling consistent across tests.

Suggested change
timestamp=datetime.datetime.now(),
timestamp=source_image.timestamp,

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude says: Fixed in d6853f5 — now uses source_image.timestamp instead of datetime.now().

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ami/exports/format_types.py`:
- Around line 192-204: Use the annotated winning detection already attached to
the exported object instead of recomputing it: in
get_best_detection_source_image_url and get_best_detection_occurrence_url, read
the annotated detection from the obj (e.g. getattr(obj, "best_detection", None)
or the specific annotated attributes like
best_detection_source_image_path/public_base_url on that detection) and build
both the source image URL and the occurrence/context URL from that detection's
stored fields (path, public_base_url or signed_url, and occurrence/context link)
rather than calling obj.context_url() which re-queries and may use a different
ordering; this ensures URLs correspond to the exported bbox/path and avoids N+1
queries.

In `@ami/main/api/serializers.py`:
- Around line 1396-1421: get_best_machine_prediction currently calls
obj.find_best_prediction() and directly accesses prediction.taxon and
prediction.algorithm, causing an N+1; change it to reuse obj.best_prediction
(use that if present, fall back to obj.find_best_prediction() only if needed)
and avoid dereferencing relations that aren't prefetched—either use the
already-attached taxon/algorithm on the prediction object or ensure these are
provided via queryset annotations/prefetch_related and passed through context to
TaxonNestedSerializer/AlgorithmNestedSerializer to prevent per-row queries;
update get_best_machine_prediction to check obj.best_prediction first and only
call find_best_prediction as a fallback, and document that views/queries should
prefetch prediction__taxon and prediction__algorithm.

In `@ami/main/models.py`:
- Around line 3210-3218: The code only sets new_score when new_determination
changes, so when authority flips but the taxon stays the same we never recompute
determination_score; update the logic around
occurrence.find_best_identification() and occurrence.find_best_prediction()
(references: top_identification, top_prediction, new_determination, new_score,
determination_score) so that whenever the authority changes you still set
new_score appropriately (None for a human top_identification,
top_prediction.score for an ML top_prediction) and assign determination_score on
the occurrence even if the taxon equals current_determination; apply the same
fix in the analogous block handling lines ~3225-3228.
- Around line 2864-2867: The aggregate verified_by_count is over-counting
because Count("identifications") is performed across detection rows; change the
Count to be distinct so each identification is only counted once (e.g. use
Count("identifications", distinct=True,
filter=Q(identifications__withdrawn=False))). Update the verified_by_count
definition (the Count call) to include distinct=True while keeping the existing
filter on identifications.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ca26e0c4-7c2a-409c-8ae0-af43172af29c

📥 Commits

Reviewing files that changed from the base of the PR and between 83a970c and f21ec4d.

📒 Files selected for processing (4)
  • ami/exports/format_types.py
  • ami/exports/tests.py
  • ami/main/api/serializers.py
  • ami/main/models.py

- Add distinct=True to verified_by_count to prevent join multiplication
- Fix update_occurrence_determination to recompute score even when taxon
  stays the same (handles authority flip without taxon change)
- Remove email fallback in verified_by (PII concern, name-only)
- Filter withdrawn identifications in verification_status fallback
- Use obj.best_prediction cached property instead of find_best_prediction()
  in API serializer to avoid N+1 queries
- Build occurrence URL from annotated fields instead of context_url()
  to avoid N+1 queries in export
- Use source_image.timestamp instead of datetime.now() in tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@annavik
Copy link
Copy Markdown
Member

annavik commented Apr 8, 2026

From reading the PR description, this looks great Michael!

Some comments and questions:

  • verified_by_count: From the PR description it sounds like this is considering all human identifications. Since we use the word "verified", maybe this should only consider identifications in line with the final determination?

  • agreed_with_algorithm: If you agree with an algorithm, then I agree with you, have I also agreed with the algorithm...?

  • best_detection_occurrence_url: There is a risk this URL could get broken if we change environment URL, do you think our are URLs are stable enough to start distribute in exports?

  • determination_matches_machine_prediction: I guess this is nice to have, but this information can also be derived from other columns. We could skip if we want to keep things compact! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(exports): add machine prediction, verification, and detection fields to exports

3 participants