Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion backend/app/crud/content_crud.py
Original file line number Diff line number Diff line change
Expand Up @@ -311,7 +311,7 @@ def get_dictionary_index(db_: Session, resource_id: int):
entries.append((word.word_id, form))

# Sort all entries by word form before grouping
entries.sort(key=lambda x: x[1])
entries.sort(key=lambda x: x[1].lower())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Using lower() does not provide full Unicode case-insensitive ordering, so words with special case mappings (for example ß/SS or Turkish İ/i) can still be sorted incorrectly. If this endpoint is intended to be truly case-insensitive for dictionary data, normalize with casefold() (optionally with Unicode normalization) in the sort key. [logic error]

Severity Level: Major ⚠️
- ⚠️ /dictionary/{id}/index returns inconsistently ordered Unicode words.
- ⚠️ Dictionary browsing harder for languages with special casing.
Steps of Reproduction ✅
1. Ensure a DICTIONARY resource exists in `db_models.Resource` with `resource_id=1` and
`content_type='dictionary'`, which is validated in
`backend/app/crud/content_crud.py:22-35` within `get_dictionary_index`.

2. Insert rows into `db_models.Dictionary` for that resource where `word_forms` (queried
at `backend/app/crud/content_crud.py:38-41`) contains comma-separated Unicode words with
special case mappings, e.g. `"Straße,Strasse"` or `"İstanbul,istanbul"`.

3. Call the HTTP endpoint `GET /dictionary/1/index` defined in
`backend/app/router/content.py:15-39`, which invokes
`content_crud.get_dictionary_index(db_, resource_id=1)` at
`backend/app/router/content.py:31-34`.

4. Observe in the JSON response that within each `letter` group (built at
`backend/app/crud/content_crud.py:58-66`), the `words` list ordering is determined by
`entries.sort(key=lambda x: x[1].lower())` at `backend/app/crud/content_crud.py:314`;
Python `str.lower()` does not implement full Unicode case-folding (e.g., `ß` vs `ss`,
Turkish `İ` vs `i`), so ordering for such characters is not truly case-insensitive.
Changing the sort key to use `x[1].casefold()` (optionally combined with Unicode
normalization) would provide more correct case-insensitive ordering for dictionary entries
containing these characters.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** backend/app/crud/content_crud.py
**Line:** 314:314
**Comment:**
	*Logic Error: Using `lower()` does not provide full Unicode case-insensitive ordering, so words with special case mappings (for example `ß`/`SS` or Turkish `İ`/`i`) can still be sorted incorrectly. If this endpoint is intended to be truly case-insensitive for dictionary data, normalize with `casefold()` (optionally with Unicode normalization) in the sort key.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎


# Group by first letter of the word form
index_dict = {}
Expand Down
Loading