Skip to content

fix: repair all 6 knowledge import endpoints (0/6 → 6/6 working)#115

Merged
Steake merged 7 commits intomainfrom
copilot/repair-broken-knowledge-endpoints
Mar 6, 2026
Merged

fix: repair all 6 knowledge import endpoints (0/6 → 6/6 working)#115
Steake merged 7 commits intomainfrom
copilot/repair-broken-knowledge-endpoints

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 6, 2026

Description

All 6 knowledge import endpoints returned 503 because a single try/except block coupled knowledge_ingestion_service, knowledge_management_service, and knowledge_pipeline_service imports together. When thinc (spaCy dep) was unavailable, the entire block failed, setting KNOWLEDGE_SERVICES_AVAILABLE = False.

Import chain fix

  • Split the monolithic import into 3 independent try/except blocks so ingestion works even when management/pipeline deps are missing

Missing endpoints

  • Added GET /api/knowledge/import/status/{job_id} (delegates to existing progress handler)
  • Added DELETE /api/knowledge/import/cancel/{job_id} (delegates to cancel logic)

Stub replacements

  • POST /api/knowledge/import/batch — was generating fake IDs; now uses BatchImportSchema from the shared schema layer and routes through KnowledgeIngestionService for text/URL sources. Returns per-item results array with individual status/error for transparency.
  • DELETE /api/knowledge/import/{import_id} — was hardcoded "cancelled"; now calls cancel_import() and reports actual state

Bug fixes

  • except Exception was swallowing HTTPException(400) and re-raising as 500 in all four import endpoints (url/text/file/wikipedia). Added except HTTPException: raise before the generic handler.
  • Removed unreachable dead code (duplicate except block) in URL import
  • Fixed over-indented except block in file import handler that would cause IndentationError
  • Fixed file import response to return the actual determined_file_type instead of the raw form field

Schema integration (merged from main)

Related Issues

Test Evidence

18 new async tests via httpx.AsyncClient against the ASGI app (no live server). All 146 backend + integration tests pass:

tests/backend/test_knowledge_import_endpoints.py    18 passed
tests/backend/                                     127 passed, 20 skipped
tests/integration/test_api_schema_contracts.py      19 passed

CodeQL: 0 alerts (Python + JavaScript).

Checklist

  • Tests pass locally (pytest tests/)
  • Code is formatted (black . and isort .)
  • Documentation updated (if applicable)
  • No secrets or credentials committed
  • Related issue linked above
Original prompt

This section details on the original issue you should resolve

<issue_title>feat: knowledge import system — repair all 6 broken endpoints (currently 0/6 working)</issue_title>
<issue_description>## Context
docs/MISSING_BROKEN_FUNCTIONALITY.md (generated from comprehensive E2E testing) reports the knowledge import system as 100% broken — 0/6 endpoints working. This is the primary user-facing value proposition of GödelOS; a consciousness operating system that cannot ingest knowledge is philosophically interesting and operationally worthless.

Broken Endpoints (from docs)

  • POST /api/knowledge/import/url — URL import
  • POST /api/knowledge/import/text — Direct text ingestion
  • POST /api/knowledge/import/file — File upload ingestion
  • GET /api/knowledge/import/status/{job_id} — Job status polling
  • POST /api/knowledge/import/batch — Batch import
  • DELETE /api/knowledge/import/cancel/{job_id} — Job cancellation

Tasks

  • Trace each endpoint from backend/api/knowledge_management_endpoints.py → service layer → backend/knowledge_ingestion.py
  • Identify where each fails (likely: schema mismatch, unhandled exception, uninitialised service)
  • Fix or implement the missing service logic in backend/knowledge_ingestion.py
  • Ensure KnowledgeIngestionService is properly initialised and injected at startup in backend/unified_server.py
  • Write pytest tests for all 6 endpoints using httpx.AsyncClient (no live server required)

Acceptance Criteria

All 6 endpoints return expected 2xx responses in the test suite. URL import successfully stores a parsed knowledge item retrievable via GET /api/knowledge.

Milestone: v0.5
</issue_description>

<agent_instructions>Repair all 6 knowledge import endpoints (currently 0/6 working). Trace each from route → service → ingestion layer. Fix or implement missing service logic. Ensure KnowledgeIngestionService initialises at startup. Write pytest tests for all 6 endpoints.</agent_instructions>

Comments on the Issue (you are @copilot in this section)

Custom agent used: Oli
Oliver Chimera Profile


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Copilot AI and others added 2 commits March 6, 2026 05:11
- Separate knowledge_ingestion_service import from knowledge_management
  and knowledge_pipeline so thinc/spaCy failures don't disable ingestion
- Add GET /api/knowledge/import/status/{job_id} endpoint
- Add DELETE /api/knowledge/import/cancel/{job_id} endpoint
- Fix batch endpoint to use actual ingestion service
- Fix cancel endpoint to use actual ingestion service
- Fix HTTPException swallowing in URL/text/file/wikipedia endpoints
- Remove dead code (duplicate except block) in URL import
- Add 18 pytest tests for all 6 endpoints via httpx.AsyncClient

Co-authored-by: Steake <530040+Steake@users.noreply.github.com>
…back ID

Co-authored-by: Steake <530040+Steake@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix all broken endpoints in knowledge import system fix: repair all 6 knowledge import endpoints (0/6 → 6/6 working) Mar 6, 2026
@Steake Steake marked this pull request as ready for review March 6, 2026 09:13
@Steake Steake self-requested a review as a code owner March 6, 2026 09:13
Copilot AI review requested due to automatic review settings March 6, 2026 09:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Repairs the knowledge import API surface on backend/unified_server.py so the ingestion endpoints function even when optional knowledge-related dependencies fail to import, and adds/updates endpoint implementations + tests to cover the full 6-endpoint contract.

Changes:

  • Decouples knowledge service imports so ingestion can remain available independently of management/pipeline imports.
  • Adds missing status/cancel alias endpoints and replaces stubbed batch/cancel logic with ingestion-service backed behavior.
  • Adds an async HTTP-layer test suite covering all 6 knowledge import endpoints and updates an existing cancel assertion to allow not_found.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
backend/unified_server.py Splits knowledge service imports; adds status/cancel aliases; fixes HTTPException handling; implements batch/cancel behavior via ingestion service.
tests/backend/test_knowledge_import_endpoints.py New async tests for all 6 knowledge import endpoints using httpx.ASGITransport and a mocked ingestion service.
tests/backend/test_api_endpoints.py Relaxes cancel assertion to accept either cancelled or not_found.

Comment on lines 2689 to 2690
logger.error(f"Error importing knowledge from file: {e}")
raise HTTPException(status_code=500, detail=f"File import error: {str(e)}")
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The except Exception as e: block has over-indented statements (logger.error / raise) which will cause an IndentationError and prevent backend/unified_server.py from importing. Align the contents of the generic except block with the except indentation level.

Suggested change
logger.error(f"Error importing knowledge from file: {e}")
raise HTTPException(status_code=500, detail=f"File import error: {str(e)}")
logger.error(f"Error importing knowledge from file: {e}")
raise HTTPException(status_code=500, detail=f"File import error: {str(e)}")

Copilot uses AI. Check for mistakes.
@@ -2664,6 +2683,8 @@ async def import_knowledge_from_file(file: UploadFile = File(...), filename: str
"file_type": file_type
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import_knowledge_from_file computes determined_file_type and passes it into FileImportRequest, but the HTTP response returns the raw file_type form field (which may be None/different). Return the actual determined_file_type so clients see the true interpreted type.

Suggested change
"file_type": file_type
"file_type": determined_file_type

Copilot uses AI. Check for mistakes.
Comment on lines 3151 to 3157
else:
import_ids.append(fallback_id)
except Exception as exc:
logger.warning(f"Batch item {i} failed: {exc}")
import_ids.append(fallback_id)

return {"import_ids": import_ids, "batch_size": len(import_ids), "status": "queued"}
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import_knowledge_batch appends a generated fallback_id when ingestion is unavailable or when a batch item fails, but still returns status: "queued" without indicating which items failed to enqueue. This can mislead clients into thinking the entire batch was accepted. Consider returning per-item results (id + status/error) and/or using a non-2xx response for invalid items when the ingestion service is available.

Copilot uses AI. Check for mistakes.
level so that we test the HTTP layer in isolation.
"""

import asyncio
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import asyncio (not referenced anywhere in this test module). Removing it avoids lint noise and keeps the test focused.

Suggested change
import asyncio

Copilot uses AI. Check for mistakes.
@Steake
Copy link
Copy Markdown
Owner

Steake commented Mar 6, 2026

@copilot PR #114 (shared schema contracts) has merged to main and created a conflict with this branch. Please rebase onto current main and resolve any conflicts — the knowledge import endpoints and the schema layer must coexist cleanly. Then mark ready for review.

- Use BatchImportSchema from shared schemas for batch endpoint
- Fix file_type response to return determined_file_type
- Fix over-indented except block in file import handler
- Add per-item results to batch response
- Remove unused asyncio import from test module
- Update validation tests: expect 422 for Pydantic validation errors

Co-authored-by: Steake <530040+Steake@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 6, 2026

@copilot PR #114 (shared schema contracts) has merged to main and created a conflict with this branch. Please rebase onto current main and resolve any conflicts — the knowledge import endpoints and ...

Merged main into the branch at 61170fe, resolving the batch endpoint conflict — now uses BatchImportSchema from the shared schema layer. Also addressed all review comments from the previous round (indentation fix, determined_file_type in response, per-item batch results, unused import removal). All 146 backend + schema contract tests pass.

Copilot AI and others added 2 commits March 6, 2026 17:29
…owledge-endpoints

# Conflicts:
#	backend/unified_server.py
#	godelOS/core_kr/knowledge_store/__init__.py
…ledge-endpoints (unblock PR #115)

merge: resolve conflicts — bring main into copilot/repair-broken-knowledge-endpoints (unblock PR #115)
@Steake Steake merged commit b8ed09a into main Mar 6, 2026
2 checks passed
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 6, 2026

🧪 CI — Python 3.11

�[31mERROR�[0m tests/test_cognitive_subsystem_activation.py::�[1mTestEndToEndFlow::test_context_engine_round_trip�[0m - ModuleNotFoundError: No module named 'nltk'
�[31mERROR�[0m tests/test_cognitive_subsystem_activation.py::�[1mTestEndToEndFlow::test_nlg_pipeline_process�[0m - ModuleNotFoundError: No module named 'nltk'
�[31m====== �[31m�[1m4 failed�[0m, �[32m924 passed�[0m, �[33m81 skipped�[0m, �[33m65 warnings�[0m, �[31m�[1m22 errors�[0m�[31m in 34.35s�[0m�[31m ======�[0m

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 6, 2026

🧪 CI — Python 3.10

�[31mERROR�[0m tests/test_cognitive_subsystem_activation.py::�[1mTestEndToEndFlow::test_context_engine_round_trip�[0m - ModuleNotFoundError: No module named 'nltk'
�[31mERROR�[0m tests/test_cognitive_subsystem_activation.py::�[1mTestEndToEndFlow::test_nlg_pipeline_process�[0m - ModuleNotFoundError: No module named 'nltk'
�[31m====== �[31m�[1m4 failed�[0m, �[32m924 passed�[0m, �[33m81 skipped�[0m, �[33m65 warnings�[0m, �[31m�[1m22 errors�[0m�[31m in 33.00s�[0m�[31m ======�[0m

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: knowledge import system — repair all 6 broken endpoints (currently 0/6 working)

3 participants