Add Kreuzberg document intelligence MCP shared workflow#28392
Merged
Conversation
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/5c34a633-2709-4016-858d-9b24db09c75d Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot created this pull request from a session on behalf of
pelikhan
April 25, 2026 03:56
View session
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a shared MCP server configuration for the Kreuzberg document intelligence engine, enabling read-only document extraction and related utilities via a containerized stdio JSON-RPC 2.0 server.
Changes:
- Introduces a new shared MCP import file defining the Kreuzberg container server and a curated read-only tool allowlist.
- Mounts the GitHub workspace read-only to allow
extract_file/batch_extract_filesto resolve repository paths. - Includes inline documentation describing available/excluded tools, container image options, and usage.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/shared/mcp/kreuzberg.md | Defines the Kreuzberg MCP server (container + mounts + allowed tools) for reuse via imports. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 1/1 changed files
- Comments generated: 1
Comment on lines
+57
to
+68
| | Tool | Params | Description | | ||
| |---|---|---| | ||
| | `extract_file` | `path` | Extract text and metadata from a local file | | ||
| | `extract_bytes` | `data` (base64) | Extract from base64-encoded file content | | ||
| | `batch_extract_files` | `paths` | Extract multiple files in one call | | ||
| | `detect_mime_type` | `path` | Identify a file's MIME type | | ||
| | `list_formats` | — | List all supported file formats | | ||
| | `get_version` | — | Return the library version string | | ||
| | `embed_text` | `texts` | Generate embedding vectors for text chunks | | ||
| | `chunk_text` | `text` | Split text into overlapping chunks | | ||
| | `cache_stats` | — | Report how much content is cached | | ||
| | `cache_manifest` | — | Return model checksums | |
There was a problem hiding this comment.
The markdown table under "Available tools (read-only)" uses double leading pipes (|| ...) on each row, which is not valid table syntax and makes the table hard to read/parse in raw form. Use standard single-pipe table rows (e.g., | Tool | Params | Description |, |---|---|---|, etc.).
This was referenced Apr 25, 2026
github-actions Bot
added a commit
that referenced
this pull request
Apr 25, 2026
- Add Kreuzberg document intelligence MCP server as a dedicated row in the shared MCP configurations table in mcps.md (#28392) - Update the observability glossary entry to mention github.workflow_ref as a resource attribute on all OTel spans, enabling workflow filtering in multi-workflow repositories (#28358) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a shared MCP component wrapping the Kreuzberg document intelligence engine — 97+ format extraction (PDF, DOCX, images via Tesseract OCR, legacy Office via LibreOffice) over stdio JSON-RPC 2.0.
Changes
.github/workflows/shared/mcp/kreuzberg.mdghcr.io/kreuzberg-dev/kreuzberg:latest, entrypoint argmcp${GITHUB_WORKSPACE}:${GITHUB_WORKSPACE}:ro) soextract_file/batch_extract_filesresolve paths without remappingimports: [shared/mcp/kreuzberg.md].