🛫 LLeM

A local-first coding sidekick for Antigravity, VS Code and Cursor. Stream live model output, chat with your repo, edit files, run terminal commands, and keep everything on-device without shipping your soul to the cloud.

✈️ The Story

This extension was built because I was tired of being ghosted by AI every time I hit 30,000 feet. No Wi-Fi? No problem. LLeM was made specifically for those long-haul flights where you just want to vibe and code without needing an internet connection.

Note

Credits & Origin: Huge shoutout to the OG inspiration, connect-ai. We took that foundation, gave it a massive refactor, and cranked everything up to eleven. We’re talking boosted performance, fresh features, and a serious security audit to keep your local workflow locked down. We didn't just download it; we leveled it up.

Special Thanks: Seriously, LLeM wouldn't even exist if it weren't for the connect-ai sharing their code with the world. Major respect for that open-source energy—you guys made this happen. 🫶

Fair Play: If you're planning to build on top of this or create something new based on this code, please keep the good karma flowing and make sure to shout out the original creators connect-ai and their contributions. Respect the hustle! ✌️

Important

LLeM is 100% local. Your code never leaves your machine. No cloud, no drama, just pure local intelligence.

🚀 What's New

v3.5.12 — Better context-mode MCP command support

LLeM now treats common context-mode utility commands as first-class MCP actions instead of sending them through the normal chat-completion path. That makes commands like ctx stats and /ctx_stats faster, cleaner, and less likely to confuse local models.

Direct context-mode commands

You can now run context-mode utilities directly from chat:

ctx stats
ctx doctor
ctx upgrade
ctx purge
ctx insight

LLeM recognizes these as MCP utility commands, routes them to the matching context-mode tool, and displays the result back in the chat.

Slash aliases for context-mode

LLeM also understands slash-style aliases for the same tools:

/ctx_stats
/ctx-stats
/ctx_doctor
/ctx-doctor
/context-mode:ctx-stats
/context-mode:ctx-doctor

Hyphenated aliases are normalized to the MCP tool names internally, so /ctx-stats resolves to ctx_stats, and /context-mode:ctx-doctor resolves to ctx_doctor.

Cleaner MCP result rendering

MCP responses shaped like { content: [{ type: "text", text: "..." }] } are now unwrapped and displayed as plain text. Raw JSON is only shown as a fallback when LLeM cannot extract text content.

This fixes the context-mode stats display issue where the result could appear as a JSON code block instead of readable output.

Less accidental repetition noise

The previous JSON rendering path could expose long separator lines from context-mode stats output. Some local models would echo those separators during follow-up turns, which could trip LLeM's repetition watchdog. With text-first MCP rendering, the stats output stays readable and avoids feeding noisy JSON wrappers back into the conversation.

Reliability coverage

This update also adds regression tests for the new context-mode command parsing behavior, including slash aliases and natural utility commands.

📥 Installation

Since LLeM is currently in early flight, we distribute it via .vsix files. Follow these steps to get airborne:

1. Download the Extension

Go to the LLeM GitHub Repository.
Look at the Releases section on the right sidebar.
Click on the latest release tag (e.g., v3.1.3).
Under the Assets section, click on the .vsix file (e.g., llem-3.1.3.vsix) to download it to your machine.

2. Install in VS Code or Cursor

Open VS Code or Cursor.
Open the Extensions view by clicking the square icon in the left sidebar or pressing Cmd+Shift+X (macOS) or Ctrl+Shift+X (Windows/Linux).
Click the ... (More Actions) menu icon in the top right corner of the Extensions view title bar.
Select Install from VSIX... from the dropdown menu.
Locate and select the .vsix file you just downloaded.
Once installed, you might need to click Reload or restart your editor.

✨ Features

🛡️ Local-First Workflow: Connects directly to local engines like Rapid-MLX, LM Studio, or Ollama. No cloud, no API costs.
🚀 Live Streaming: Real-time output rendered inside a custom VS Code chat panel with full Markdown and code block support.
🛠️ Agentic Actions: Trigger file creations, non-destructive edits, and terminal commands directly from the AI's response.
🗂️ Persistent History: Conversations are automatically saved to ~/.llem-history, supporting session recovery, renaming, and bulk deletion.
🔍 Workspace Awareness: Real-time monitoring of your project files. Drop files/folders into chat for instant, high-fidelity context injection.
🧠 The Brain (Markdown Vault): Sync your notes with an Obsidian-compatible vault. Supports visual network maps and local Git synchronization.
⚡ Performance First: Multi-layered caching, request throttling, and token-usage monitoring to keep your dev environment snappy.
🧭 Model-Aware Prompt Budgeting: Automatically trims prompt weight for big local models so 24B+ and 26B-class runs stay responsive instead of drowning in context.
📊 Built-In Diagnostics: Inspect prompt size, first-token delay, section-by-section context weight, and streaming throughput directly from the LLeM diagnostics panel.

🚀 Quick Start

To get started, you'll need a local model runtime running on your machine.

1. Choose Your Engine

Rapid-MLX

Typical URL: http://127.0.0.1:8000

Rapid-MLX is supported as an OpenAI-compatible local engine. LLeM talks to its /v1/chat/completions streaming endpoint and reads installed models from /v1/models.

# Install and serve a model on the default Rapid-MLX port
pip install vllm-mlx
rapid-mlx serve qwen3.5-4b --port 8000

Then point llem.engineUrl at http://127.0.0.1:8000 or pick Rapid-MLX from Settings -> Swap model engine.

Ollama

Typical URL: http://127.0.0.1:11434

# Pull a model and serve
ollama pull gemma4:e4b
ollama serve

For larger local runs, a 24B+ Gemma-family model is a better fit for the new performance profile flow:

# Example 26B-class local setup
ollama pull gemma6:26b
ollama serve

LM Studio

Typical URL: http://127.0.0.1:1234

Download and load your favorite model.
Enable the Local Server.
Confirm the server is active.
Point llem.engineUrl at http://127.0.0.1:1234 or pick LM Studio from Settings -> Swap model engine.

LLeM treats LM Studio as an OpenAI-compatible local engine and automatically normalizes the request path to /v1/chat/completions.

⚙️ Configuration

Open your VS Code settings.json to customize the experience.

Setting	Description	Default
`llem.engineUrl`	Local/remote model endpoint URL. Supports Rapid-MLX (`:8000`), LM Studio (`:1234`), OpenAI-compatible `/v1` servers, and Ollama (`:11434`).	`http://127.0.0.1:11434`
`llem.defaultModel`	The default model slug used for requests.	`gemma4:e4b`
`llem.performancePreset`	Prompt and generation budget profile. Use `auto`, `balanced`, or `large-local-26b`.	`auto`
`llem.requestTimeout`	Request timeout in seconds.	`300`
`llem.vaultPath`	Path to your markdown vault.	`~/.llem-vault`
`llem.bridgeEnabled`	Enable the local HTTP bridge on port 4825.	`false`
`llem.bridgeToken`	Security token for authenticated bridge callers.	`(empty)`
`llem.mcpEnabled`	Enable MCP server discovery and tool calls.	`true`
`llem.mcpServers`	MCP servers registered directly in LLeM.	`{}`
`llem.mcpConfigSources`	MCP sources to resolve.	`["llem", "workspace", "codex-global", "codex-project"]`
`llem.mcpConfigPaths`	Extra MCP JSON/TOML config paths to import.	`[]`
`llem.mcpToolTimeoutSeconds`	Timeout for MCP startup, listing, and calls.	`60`
`llem.maxHistoryItems`	Maximum number of sessions to keep in history.	`100`

Tip

If you're using a slower model or long prompts, try bumping up the llem.requestTimeout.

MCP Servers

LLeM can register and run MCP servers with Codex-style action tags:

<list_mcp_tools/>
<call_mcp_tool server="context7" tool="resolve-library-id">{"libraryName":"react"}</call_mcp_tool>

Direct LLeM config lives in llem.mcpServers:

"llem.mcpServers": {
  "context7": {
    "command": "npx",
    "args": ["-y", "@upstash/context7-mcp"],
    "env": {},
    "enabled": true
  }
}

LLeM also syncs Codex MCP settings from $CODEX_HOME/config.toml, ~/.codex/config.toml, and <workspace>/.codex/config.toml. Before applying a sync it shows a diff with Added, Removed, and Changed servers; environment values are masked and only changed keys are shown. Synced snapshots are stored in ~/.llem/llem-mcp-synced.json instead of VS Code settings, and user-owned llem.mcpServers are never deleted or modified by Codex sync.

Use Settings -> MCP servers -> Import MCP from GitHub URL to paste an MCP repository URL. LLeM reads README/package/config examples, previews the inferred server command, and imports it only after approval.

v1 runs stdio MCP servers only. HTTP/SSE/remote entries can be imported and listed, but tool calls report them as unsupported.

Prompt Prolog

LLeM reads markdown files from ~/.llem/prolog before every prompt. Files are loaded in 0-9, A-Z filename order and injected directly after the built-in system prompt as mandatory prolog instructions. Use this for durable local routing rules, house style, or workflow constraints that should apply to every request.

Troubleshooting

If terminal commands fail on Windows, confirm node, npm, and npx are available in the VS Code process environment.
If a model response suggests an edit that has no effect, use read_file first and retry with the current file content.

26B Local Model Tuning

For bigger local models such as gemma6:26b or other 24B+ Gemma-family builds:

prefer Rapid-MLX or LM Studio when you want an OpenAI-compatible local server,
prefer Ollama when you want the native Ollama /api/chat path and local manifest-based capability hints,
switch llem.performancePreset to large-local-26b if you want tighter prompt budgets immediately,
keep llem.performancePreset on auto if you want LLeM to detect 26B-class models by name or metadata,
raise llem.requestTimeout to around 600 seconds on slower or memory-constrained machines,
pair a 26B default with a smaller fallback model if you want fast iteration for simple edits.

Current-machine guidance:

on Apple Silicon systems around the 34 GB class, large-local-26b is the recommended preset for 26B local models,
on other machines, start with the same preset and only widen timeout or context if your hardware can comfortably handle it.

Performance Profiles

LLeM now exposes a model-sensitive prompt and generation budget setting through llem.performancePreset.

auto: Recommended default. LLeM checks the selected model name and, when available, Ollama metadata such as parameter_size. If the model looks like a 24B+ local run, it automatically switches into the 26B profile.
balanced: Keeps the wider default context and generation budget. This is the better fit for smaller local models when raw responsiveness is already good.
large-local-26b: Uses a tighter prompt budget and smaller Ollama generation window so big local models spend less time chewing through workspace context before the first token lands.

When large-local-26b is active, LLeM intentionally becomes more selective about context:

active editor context gets first priority,
attached text files are budgeted per file and across the whole turn,
workspace tree and vault index are clipped more aggressively,
and older low-relevance chat history is pruned before the current request is allowed to grow out of control.

This is designed to improve real-world latency, not benchmark token counts in isolation. The point is to keep the answer useful while reducing the hidden prompt tax that large local models pay.

Diagnostics And What To Watch

Use LLeM: Show Diagnostics when tuning a larger model. The diagnostics channel now surfaces the key numbers you need:

selected model and resolved performance profile,
estimated prompt size before send,
final request size after pruning,
history, attachment, active-editor, workspace, and vault character breakdowns,
pruned message count and attachment trim amount,
first-token latency,
total stream duration,
and token throughput.

If a 26B-class model still feels sluggish, the fastest knobs to check are:

llem.performancePreset
llem.requestTimeout
total attachment size in the current turn
whether the active file or vault index is unusually large

In practice, this makes it much easier to see whether the bottleneck is model load time, prompt size, or generation speed.

🛠️ Development

Prerequisites

Node.js (v18+)
npm

Commands

Compile: npm run compile
Build VSIX: npm run package:vsix
Local Test VSIX: npm run package:vsix:local

⚠️ Known Issues

Context Limits: Large file attachments might hit the context window limit of your local model.
Large-Model Warmup: The first request to a 24B+ local model can still feel slow even after prompt trimming, especially right after loading the model into memory.
Server Check: Make sure your local engine (Rapid-MLX/LM Studio/Ollama) is actually running before you start chatting.

📝 Release Notes

v3.1.1 — Editable Message Branching, Preference Memory, and Markdown Rendering Fixes

v3.1.1 builds on the v3.1.0 chat UX refresh and adds the missing piece: editing earlier user messages in a Gemini Web-style flow.

Edit earlier messages like Gemini Web

You can now go back to a previous user message, click Edit, revise the prompt, and continue from there.

the old thread stays intact,
LLeM creates a new branch from the point before that message,
the edited prompt is resubmitted into that branch,
and any reusable attached files from the original message can travel with the edit flow.

This keeps the conversation history safe while making prompt iteration much faster and less destructive.

Reply actions are now a full iteration loop

With Copy, Branch, Edit, 👍, and 👎, each finished exchange can now be reused in multiple ways:

Copy a strong answer,
Branch an assistant response into a new direction,
Edit a user message to retry from an earlier point,
Like a response style you want repeated,
Dislike a response style you want avoided later.

That makes LLeM feel much closer to modern consumer chat tools while staying inside VS Code and staying local-first.

Persistent response memory still carries across everything

Preference memory continues to apply across:

normal follow-up turns,
new chats,
chat branches,
and edited-message branches.

So if you teach LLeM what kind of answers you like, that preference signal survives even when you fork or revise the conversation path.

File click behavior is stricter and smarter

Clickable file references in chat are now more accurate:

only editable file types can be opened from chat,
basename-only references like extension.ts can resolve to a real workspace file when the match is unambiguous,
and chat attachments preserve enough metadata to reopen the right source more reliably.

Markdown rendering is more reliable inside chat

The webview renderer now handles leftover inline Markdown markers more gracefully in normal prose.

inline bold and emphasis markers render more reliably in mixed-language text,
bullet items such as - **“로컬 환경에 뿌리내린(Local-first) 지능형 에이전트”** now display with the intended emphasis,
and the fallback logic avoids touching fenced code blocks while cleaning up visible chat output.

Technical highlights

added editable earlier-message branching from the webview action bar,
preserved reusable attachment payloads in display history for edit/retry flows,
added branch generation from the point before a selected user message,
improved workspace filename resolution for clickable chat file references,
added a safe inline-Markdown fallback for webview chat rendering,
kept reply-style preference memory persistent across branch variants.

Why `v3.1.1` matters

The big shift here is that LLeM is no longer just good at answering or branching. It is now better at revising. That means less copy-paste, less losing context, and much smoother iteration when you're tightening prompts or trying alternate implementation directions.

v3.1.0 — Gemini-Style Reply Actions, Branching, and Preference Memory

v3.1.0 is the first release that makes each completed reply feel more like a modern chat product, while still keeping the whole workflow local-first.

New reply actions after every completed assistant turn

Once an assistant reply finishes streaming, LLeM now shows a compact action row directly under that message.

Copy: Copies just that specific assistant response to your clipboard.
Branch: Creates a brand-new chat branch from that response so you can explore a different direction without losing the original thread.
👍 Like: Marks that answer style as something the user wants more of.
👎 Dislike: Marks that answer style as something the user wants less of.

This interaction model is intentionally inspired by the post-reply controls you see in Gemini Web, but adapted to LLeM's local VS Code workflow.

Chat branching

Branching is now a first-class concept inside the chat experience.

You can branch from any completed assistant response.
The new branch becomes its own saved chat session.
The original conversation remains untouched in history.
The branch inherits the visible conversation context up to the selected reply, making it easy to explore alternate plans, implementations, or follow-up prompts.

This is especially useful when you want to:

compare two implementation strategies,
keep one thread focused on debugging while another explores a refactor,
or preserve a "good state" before taking the conversation in a different direction.

Persistent response-preference memory

Likes and dislikes are not cosmetic. They now update a persistent memory layer that survives:

new chats,
branched chats,
and extension restarts.

When you give feedback on a reply, LLeM stores a compact memory of that preference and uses it to steer future responses. In practice, that means:

replies you like help reinforce the kind of tone, structure, and answer shape you want,
replies you dislike tell the assistant to avoid similar response patterns later unless you explicitly ask for them.

This preference memory is injected into the system context for future requests, so LLeM can adapt over time instead of acting like every conversation starts from zero.

Better alignment between UI behavior and file opening rules

This release also tightens the file interaction model inside chat:

only editable file types are shown as clickable in message content,
only editable attachments can be opened from chat,
and dropped file attachments preserve enough metadata to open the correct source more reliably.

That keeps chat interactions cleaner and avoids misleading "clickable" affordances on files that are not actually editable in the intended way.

What changed technically

Under the hood, this release adds several important building blocks:

a shared editable-file classifier used by both the webview and extension host,
per-message feedback state in persisted chat history,
a new response preference manager backed by extension global state,
message-level UI actions for copy, branching, and feedback,
and branch session generation from the currently visible conversation timeline.

Why this matters

LLeM has always focused on local execution, real file edits, and practical repo-aware assistance. With v3.1.0, the chat UX becomes much more iterative:

you can fork thought paths without losing your place,
quickly reuse or share strong replies,
and gradually teach the assistant how you want it to respond.

Still local. Still yours. Just much more adaptable.

v3.0.5 — The "First Flight" Public Drop ✈️

Sup world! 🌍 v3.0.5 is officially out in the wild and it's our first public release. 🚀

Branding on Point: We ditched the boring stuff for a fresh icon and a UI that actually looks good.
Gemma Optimization: We tweaked the engine to hunt down Ollama's or LM Studio's default model automatically.
Chat History 2.0: Full persistence layer implemented. Your conversations now survive VS Code restarts.
Workspace Sync: Instant UI updates when you rename, delete, or add files to your project.
Security Audit: Completed a deep-dive security pass on the Bridge Server, adding rate limiting and token-based auth.
Better Vibes: Smoother logging and descriptive errors so you're never left guessing.
Public Launch: This is it. The first time we're letting this thing out of the hangar for everyone to use.

Local-first, offline-always. Let's cook. 🛫💻

Release Notes

v3.5.12

Support context-mode utility command aliases and render MCP text results directly.
Packaged release/llem-3.5.12.vsix.

v3.5.11

Show MCP slash command results directly in chat responses.
Packaged release/llem-3.5.11.vsix.

v3.5.10

MCP slash commands such as /ctx_stats now execute directly and placeholder MCP server names are resolved from available tools.
Packaged release/llem-3.5.10.vsix.

v3.5.9

Added execution modes and Windows HOME prolog loading
removed profile fallback from prolog discovery
Packaged release/llem-3.5.9.vsix.

v3.5.8

Moved the active neon animation from the header tagline to the Running now queue card
Packaged release/llem-3.5.8.vsix.

v3.5.7

Moved MCP runtime modules into src/mcp for clearer source organization
Packaged release/llem-3.5.7.vsix.

v3.5.6

Prevent partial file paths from expanding into nested MCP folders
Packaged release/llem-3.5.6.vsix.

v3.5.5

Added ~/.llem/prolog markdown prolog loading before every prompt
Applied prolog files in numeric and alphabetical filename order
Documented prompt prolog behavior in README
Packaged release/llem-3.5.5.vsix.

v3.5.4

Recognized codex as an MCP config source alias so synced Codex servers appear in LLeM server lists
Packaged release/llem-3.5.4.vsix.

v3.5.3

Moved synced MCP snapshots from VS Code settings into ~/.llem/llem-mcp-synced.json
Documented the home-profile MCP sync storage path in README
Added neon highlighting for the active running queue prompt
Removed the header neon underline below the LLeM tagline
Packaged release/llem-3.5.3.vsix.

v3.5.2

Added MCP server discovery and stdio tool-call support
Added Codex MCP config sync with diff preview and masked environment changes
Added GitHub MCP server import flow
Packaged release/llem-3.5.2.vsix.

v3.5.0

v3.5.0 expands LLeM's local engine support beyond the original Ollama/LM Studio flow and makes OpenAI-compatible MLX runtimes a first-class path.

Bumped the extension version from 3.4.3 to 3.5.0.
Added Rapid-MLX support as a local OpenAI-compatible backend.
Added automatic first-run discovery for Rapid-MLX at http://127.0.0.1:8000.
Added /v1/models model discovery for Rapid-MLX and other OpenAI-compatible local engines.
Normalized Rapid-MLX requests to /v1/chat/completions, matching the same streaming chat shape used by LM Studio.
Kept LM Studio support explicit at http://127.0.0.1:1234, including /v1/chat/completions and /v1/models.
Kept Ollama support on the native http://127.0.0.1:11434 path with /api/chat, /api/tags, and Ollama metadata/capability checks.
Updated the Settings menu so Rapid-MLX, LM Studio, and Ollama can all be selected from Swap model engine.
Updated active runtime labels so prompts and diagnostics report Rapid-MLX, LM Studio, Ollama, or a generic OpenAI-compatible local engine instead of mislabeling every /v1 endpoint as LM Studio.
Improved connection and model-not-found guidance so errors point users toward the selected engine's expected local port and startup flow.
Fixed the first action icon under chat results so it copies the message to the clipboard without pasting it into the composer or opening the edit-branch banner.
Refreshed README setup guidance with Rapid-MLX install/serve examples, LM Studio selection notes, and the complete local engine compatibility list.

v3.4.3

Renamed the VS Code tab and view labels from Assistant to LLeM.
Refreshed the compiled extension and webview bundles for the release.
Packaged release/llem-3.4.3.vsix.

v3.4.2

Removed MCP and context-mode integration from runtime, prompts, and docs
cleaned vault handling guidance and saved context-mode rules into the local vault
refreshed package contents after the MCP removal
Packaged release/llem-3.4.2.vsix.

v3.4.1

Removed LLeM branding from visible UI strings
removed context-mode integration
restored local TypeScript tooling so typecheck works again
Packaged release/llem-3.4.1.vsix.

v3.4.0

This release focuses on making agentic file edits visible, debuggable, and easier to trust when running local models such as Ollama Gemma-family models.

Codex-style file change summaries in chat: When LLeM creates, edits, or deletes files, the chat now shows a compact change card with one row per file. Each row includes the action, file name, and line-level + / - counts so you can immediately see what changed without opening the filesystem first.
Whole-turn change totals: Multi-file edits now include a footer such as 2 files changed +75 -20, giving a clear overview of the total edit impact for the current agent action.
Clickable changed files: File rows in the change summary can be clicked to open the affected file directly from the chat UI.
Review Changes shortcut: The change summary includes a Review changes button that opens VS Code's Source Control view, making it faster to inspect the workspace diff after an agent run.
Stronger edit failure visibility: If the model emits an <edit_file> action but none of the <find> blocks match the current file, LLeM now reports it as a clear failure: Edit failed ... replacement 0/N. This makes silent no-op edits much harder to miss.
Immediate Action Report streaming: External action results are now posted into the live chat stream as soon as they happen. File edits, failed replacements, safety blocks, and terminal actions no longer wait until later continuation logic to become visible.
Action Report preserved in the final answer: The final assistant message keeps the action report attached, so the user can scroll back later and still see exactly what LLeM tried, what succeeded, and what failed.
Cleaner regenerate behavior: Regenerate reply now removes the previous assistant response from the chat UI before streaming the replacement, so regeneration feels like a true retry instead of an extra appended answer.
Follow-up recovery guidance for local models: When an edit fails because the <find> text does not match, LLeM now gives the follow-up model turn a stronger system observation telling it to retry with exact current file content instead of explaining the failure away.
Post-mortem logging for file actions: File create/edit/delete paths now write structured diagnostics for validation blocks, missing files, invalid edit bodies, zero-replacement edits, successful writes, and exceptions. These logs include trace IDs, parsed action counts, file paths, replacement metadata, and previews to help reconstruct what happened after a failed run.
Safer testable logging outside VS Code: The logger now lazily loads the VS Code API and falls back to diagnostics-file logging during Node-based tests, so action logging can be covered without requiring an extension host.
Regression coverage for edit metadata: Tests now verify that file action results include structured change metadata for created, edited, and deleted files.

v3.3.35

Bumped version and upgraded axios before VSIX build.
Packaged release/llem-3.3.35.vsix.

v3.3.34

Fixed image lightbox close behavior so the top-right close button and backdrop dismiss reliably. Reduced action-history bloat by keeping only the most recent file context per turn and trimming file/web observation payloads.
Packaged release/llem-3.3.34.vsix.

v3.3.33

Reduced action-history bloat by keeping only the most recent file context per turn and trimming file/web observation payloads. Improved live output masking, offline vision detection, image lightbox preview, and request startup logging.
Packaged release/llem-3.3.33.vsix.

v3.3.32

Masked create_file and edit_file code from live output and now show progress-only streaming states. Improved offline vision-model detection using local Ollama manifests and added vision decision logging.
Packaged release/llem-3.3.32.vsix.

v3.3.31

Improved offline vision-model detection using local Ollama manifests. Fixed capability checks to use the active engine endpoint and added vision decision logging.
Packaged release/llem-3.3.31.vsix.

v3.3.30

Implemented Intelligent Repetition Guard with tiered backoff (3s, 10s, 30s), non-blocking queue scheduling, and automated retry orchestration with UI cooldown feedback.
Packaged release/llem-3.3.30.vsix.

v3.3.29

Implemented File System Access Transparency with user-approved out-of-workspace operations and high-fidelity UI feedback.
Packaged release/llem-3.3.29.vsix.

v3.3.28

Action transparency and loop prevention improvements
Packaged release/llem-3.3.28.vsix.

v3.3.27

Added live stream metadata (duration, chunks, chars) to action progress UI
Packaged release/llem-3.3.27.vsix.

v3.3.24

Implemented AI self-correction loop and Codex-style action progress visualization
Packaged release/llem-3.3.24.vsix.

v3.3.22

B-1 fix: Repeated/watchdog-aborted responses are no longer pushed to the chat history. Previously, the aborted assistant message would linger in history and seed the next turn with a contaminated context, causing cascading repetition loops. Now the pipeline returns immediately without writing the bad response to history.
B-2 fix: Consecutive assistant → assistant or user → user message pushes during agentic action loops are now de-duplicated. If a continuation user message arrives when the last history entry is already a user entry, the content is merged rather than creating a second entry.
B-3 fix: Images are no longer forwarded to text-only models (gemma, llama, mistral, etc.). The model name is inspected for known vision indicators (llava, vision, :vl, bakllava, moondream, etc.) and a clear in-chat notice is shown when an image is skipped.
B-4 fix: RequestRetryGuard fingerprints now use a normalized, punctuation-stripped 300-character prompt core instead of the raw prompt string. Rephrased retries of the same request are blocked even when the exact wording changes.
FileStateGuard: New src/fileStateGuard.ts computes SHA-256 hashes before and after every edit_file action. A no-effect warning is surfaced when the file is unchanged (typically a <find> mismatch). After 3 consecutive no-effect edits on the same file, loop-detected is returned and further edits on that path are blocked via ActionLoopGuard.
Packaged release/llem-3.3.22.vsix.

v3.3.21

Live stream output now shows raw AI text without any HTML/Markdown parsing during generation — <edit_file>, <find>, <replace> action tags are visible as-is while streaming.
Final reply (after stream completes) continues to render as full Markdown with code highlighting, file badges, and action summaries.
Removed sanitizeAssistantDisplayText() call from the live renderStreamNow() path so the raw model output is never silently stripped mid-stream.
Packaged release/llem-3.3.21.vsix.

v3.3.20

Hardened assistant output sanitization to prevent leaked action tags and scratchpad text in streamed replies
Packaged release/llem-3.3.20.vsix.

v3.3.19

Fixed RepetitionWatchdog false positives that could truncate edit-file streams during repeated action-tag/code sequences. Added regression coverage for repeated closing-tag action streams.
Packaged release/llem-3.3.19.vsix.

v3.3.18

Repackaged the current workspace state through the formal VSIX release flow.
Packaged release/llem-3.3.18.vsix.

v3.3.17

Fixed RepetitionWatchdog false positives on markdown structure tokens so tables, fences, headers, list markers, blockquotes, and task markers no longer abort valid replies
added regression tests for markdown-safe watchdog behavior
Packaged release/llem-3.3.17.vsix.

v3.3.16

Added structured repetition abort handling, retry and action loop guards, safer file mutation validation, restored clickable editable files, and added default-browser opening for chat URL links
Packaged release/llem-3.3.16.vsix.