Skip to content

Conversation

@ItzNotABug
Copy link
Member

@ItzNotABug ItzNotABug commented Jan 4, 2026

What does this PR do?

(Provide a description of what this PR does.)

Test Plan

(Write your test plan here. If you changed any code, please provide us with clear instructions on how you verified your changes work.)

Related PRs and Issues

(If this PR is related to any other PR or resolves any issue or related to any issue link all related PR and issues here.)

Have you read the Contributing Guidelines on issues?

(Write your answer here.)

Summary by CodeRabbit

  • New Features

    • Added optional OpenAI model configuration (_APP_ASSISTANT_OPENAI_MODEL with default gpt-4o)
    • Added health check endpoint for server status monitoring
    • New generic prompt API endpoint for enhanced functionality
  • Chores

    • Migrated development environment from pnpm to Bun
    • Updated all CI/CD workflows to use Bun instead of Node.js
    • Updated Docker build process to use Bun runtime
    • Updated LangChain dependencies to latest versions
    • Updated development setup documentation with Bun commands

✏️ Tip: You can customize this high-level summary in your review settings.

@ItzNotABug ItzNotABug self-assigned this Jan 4, 2026
@coderabbitai
Copy link

coderabbitai bot commented Jan 4, 2026

Walkthrough

This pull request migrates the project toolchain from Node.js/pnpm to Bun as the primary runtime and package manager. Changes include updating the Docker base image from node:18-alpine to oven/bun:alpine, replacing all npm/pnpm commands with Bun equivalents across GitHub Actions workflows, documentation, and package.json scripts. The application code is refactored to accommodate LangChain v0.x package structure updates and to replace the Express.js HTTP server with Bun's native Bun.serve API, including a custom streaming response handler. Package dependencies are reorganized, with LangChain split into separate modules (@langchain/core, @langchain/community, @langchain/openai, @langchain/textsplitters). Build arguments and environment variables are added to the Dockerfile for Git and website metadata, and ESLint configuration is updated to declare Bun as a readonly global.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Details

Key Changes by Component

Infrastructure & Configuration:

  • Docker migration introduces new base image, build-time ARGs, environment variable handling, and a modified build pipeline targeting Bun's compiler and runtime.
  • GitHub Actions workflows (.github/workflows/format.yml, lint.yml) replace Node.js setup with Bun and update all dependency/script commands accordingly.
  • .dockerignore expanded with comprehensive ignore patterns beyond the single "sources" entry.
  • ESLint configuration extended to recognize Bun as a global alongside Node.js globals.

Package & Dependencies:

  • package.json scripts converted from pnpm/npm to Bun commands; dependencies restructured to use @langchain v0.x modular packages and introduce @types/bun.
  • Removed Express, body-parser, cors, nodemon, and peggy; added @langchain packages (core, community, openai, textsplitters).

Source Code:

  • src/documents.js updated to import from @langchain/core and @langchain/textsplitters instead of langchain root.
  • src/embeddings.js introduces a new createChatModel helper, refactors getOpenAIChat signature (removal of systemPrompt parameter), and updates ChatOpenAI initialization with revised property names and prompt construction via ChatPromptTemplate.
  • src/main.js undergoes a substantial rewrite replacing Express.js with Bun.serve, introducing a createStreamingResponse factory for token emission, implementing CORS and preflight handling directly in the fetch handler, and restructuring endpoints to use retriever.invoke instead of getRelevantDocuments.

Documentation:

  • README.md and CONTRIBUTING.md updated to reflect Bun installation and script commands; README adds optional _APP_ASSISTANT_OPENAI_MODEL configuration and a new curl testing example.
  • docker-compose.yml adds _APP_ASSISTANT_OPENAI_MODEL environment variable.

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Experiment: Bun' is vague and generic, using non-descriptive terms that don't convey meaningful information about the substantial changes in the changeset. Consider a more descriptive title such as 'Migrate from Node.js/pnpm to Bun runtime' or 'Switch package manager and runtime to Bun' to clearly communicate the primary changes.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ItzNotABug ItzNotABug marked this pull request as ready for review January 4, 2026 13:45
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (2)
Dockerfile (1)

49-52: The sources directory is required at runtime and cannot be removed.

The hardcoded paths in src/documents.js (e.g., ./sources/website/src/routes/docs/**/*.markdoc and ./sources/references/**/*.md) confirm that the sources directory is accessed at runtime. However, the optimization suggestion for node_modules remains valid: since the build bundles the code with --external hnswlib-node, only that native module (and its direct dependencies) are strictly needed at runtime. The full set of node_modules includes langchain's extensive optional peer dependencies that may not be used by the bundled application. If size optimization is a priority, consider analyzing the actual imports in the bundled dist/main.js and copying only the necessary dependencies.

src/embeddings.js (1)

40-42: Consider removing unused getOpenAIChat export.

This function now simply wraps createChatModel. Looking at src/main.js, it is still used in the generic prompt endpoint. Consider whether the indirection is necessary, or if createChatModel should be exported directly instead.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bffb784 and 2c356f8.

⛔ Files ignored due to path filters (2)
  • bun.lock is excluded by !**/*.lock
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (12)
  • .dockerignore
  • .github/workflows/format.yml
  • .github/workflows/lint.yml
  • CONTRIBUTING.md
  • Dockerfile
  • README.md
  • docker-compose.yml
  • eslint.config.js
  • package.json
  • src/documents.js
  • src/embeddings.js
  • src/main.js
🧰 Additional context used
🧬 Code graph analysis (1)
src/main.js (1)
src/embeddings.js (5)
  • prompt (51-54)
  • getRagChain (48-57)
  • getRagChain (48-57)
  • getOpenAIChat (40-42)
  • getOpenAIChat (40-42)
🔇 Additional comments (26)
.dockerignore (1)

1-47: LGTM! Comprehensive ignore patterns improve Docker build hygiene.

The expanded .dockerignore file is well-structured and covers all the necessary categories: documentation, build artifacts, alternative package manager lock files, tests, IDE files, logs, environment files, OS-specific files, CI configuration, and Docker-related files. This will reduce the Docker context size and prevent unnecessary files from being included in the image.

CONTRIBUTING.md (4)

17-22: LGTM! Documentation updated to reflect Bun toolchain.

The contributing guide has been correctly updated to replace pnpm commands with Bun equivalents, aligning with the broader toolchain migration in this PR.


32-32: LGTM! Development command updated.

The dev command now uses bun run dev, consistent with the Bun migration.


44-44: LGTM! Test command reference updated.

The reference to the test script correctly uses bun run test.


65-66: LGTM! Formatting and linting commands updated.

Both commands now use Bun, consistent with the toolchain migration across the repository.

src/documents.js (1)

3-4: No action required. The LangChain import paths are correct and compatible with the installed package versions.

The imports use the proper namespaced structure for LangChain v0.3:

  • @langchain/core@^0.3.25 is installed and exports Document from the documents subpath
  • @langchain/textsplitters@^0.1.0 is installed and exports MarkdownTextSplitter

Both imports and package versions are compatible.

README.md (3)

11-25: LGTM! Clear migration to Bun tooling.

The installation instructions accurately reflect the migration from pnpm to Bun, with consistent command references and clear documentation source information.


31-37: LGTM! Accurate usage documentation.

The usage instructions correctly document the OpenAI configuration options (API key and model selection with gpt-4o default) and update the dev command to use Bun.


41-45: Good addition! Helpful testing example.

The curl example provides a practical way for developers to test the server endpoint.

eslint.config.js (1)

7-14: LGTM! Correct ESLint configuration for Bun runtime.

Adding Bun as a readonly global ensures ESLint recognizes Bun-specific APIs without flagging them as undefined, which aligns with the broader migration to Bun runtime.

package.json (3)

8-14: LGTM! Complete script migration to Bun.

All npm scripts have been correctly updated to use Bun commands, including the --watch flag for development hot-reloading as a replacement for nodemon.


34-34: LGTM! Correct Bun-specific configuration.

The addition of @types/bun provides proper TypeScript support, and the trustedDependencies array correctly identifies hnswlib-node as a native module that requires build permissions in Bun.

Also applies to: 39-41


19-30: LangChain v0.x migration is correctly implemented.

The codebase properly uses the modular LangChain v0.3.x architecture. While imports span both the monolithic langchain package (for loadQAStuffChain from langchain/chains) and the new modular packages (@langchain/core, @langchain/community, @langchain/openai, @langchain/textsplitters), this hybrid approach is supported in v0.3.x and all imports are correctly configured.

Dockerfile (4)

1-8: LGTM! Appropriate base image and dependencies.

The migration to oven/bun:alpine with necessary build tools (python3, make, g++, git) supports both native module compilation (hnswlib-node) and source fetching operations.


14-16: LGTM! Correct dependency installation.

Using --frozen-lockfile ensures reproducible builds, and --production appropriately excludes development dependencies in the builder stage.


21-30: LGTM! Flexible build configuration.

The build arguments for Git and website URLs provide appropriate flexibility for different deployment environments while maintaining sensible defaults.


36-41: LGTM! Proper Bun build configuration.

The build flags are appropriate:

  • --target bun ensures Bun runtime compatibility
  • --minify reduces bundle size
  • --external hnswlib-node correctly excludes the native module from bundling, allowing it to load at runtime
docker-compose.yml (1)

10-10: LGTM! Consistent with Dockerfile and documentation.

Adding the _APP_ASSISTANT_OPENAI_MODEL environment variable allows runtime configuration of the OpenAI model, consistent with the Dockerfile default and README.md documentation.

src/embeddings.js (4)

1-5: LGTM on import updates.

The migration to scoped LangChain packages (@langchain/community, @langchain/openai, @langchain/core) aligns with LangChain's v0.3.x module structure.


11-24: LGTM on the new createChatModel helper.

Clean extraction of the ChatOpenAI instantiation logic with streaming support. The callback-based token handling via handleLLMNewToken is correctly configured.


26-35: LGTM on embeddings configuration update.

The change to apiKey aligns with the @langchain/openai v0.3.15 API, as confirmed in the official documentation.


48-56: No action needed. The ChatPromptTemplate correctly uses {context} and {question} placeholders, which match the default expectations of loadQAStuffChain. The chain will convert input_documents to the context variable as intended.

src/main.js (4)

16-29: LGTM on the streaming response factory.

Clean abstraction for creating streaming responses with proper error handling via controller.error().


31-46: LGTM on CORS and preflight handling.

The CORS configuration is appropriately permissive for a development/internal service. Preflight requests are handled correctly.


68-72: LGTM on document retrieval update.

Using retriever.invoke(prompt) aligns with LangChain's updated retriever API (replacing deprecated getRelevantDocuments).


116-123: Race condition: server accepts requests before retriever is initialized.

Bun.serve starts immediately and can accept requests while initializeRetriever() is still running. The /v1/models/assistant/prompt endpoint correctly checks for !retriever, but this code pattern is unusual—typically initialization completes before the server starts listening.

Consider restructuring to initialize before serving:

🔎 Proposed restructure
-Bun.serve({
-  port,
-  async fetch(req) {
+const startServer = async () => {
+  console.log("Initializing search index...");
+  try {
+    retriever = await initializeRetriever();
+    console.log("Search index initialized");
+  } catch (e) {
+    console.error("Failed to initialize search index:", e);
+    process.exit(1);
+  }
+
+  Bun.serve({
+    port,
+    async fetch(req) {
     // ... existing handler code ...
-  },
-});
-
-console.log(`Started server on port: ${port}`);
-console.log("Initializing search index...");
-try {
-  retriever = await initializeRetriever();
-  console.log("Search index initialized");
-} catch (e) {
-  console.error(e);
-}
+    },
+  });
+
+  console.log(`Started server on port: ${port}`);
+};
+
+startServer();

Comment on lines 16 to 25
- name: Setup Bun
uses: oven-sh/setup-bun@v2
with:
node-version: '18'
bun-version: latest

- name: Install dependencies
run: npm install
run: bun install

- name: Format
run: npm run format
run: bun run format
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

latest stable version of Bun runtime

💡 Result:

The latest stable Bun release is v1.3.5 (released December 17, 2025). [1]

To verify locally: run bun --version. [2]

Sources:

  • Bun release notes — v1.3.5 (Dec 17, 2025). [1]
  • Bun upgrade / version guide (how to check with bun --version). [2]

Pin the Bun version for reproducible CI builds.

Using bun-version: latest can lead to non-deterministic builds when new Bun versions are released. A breaking change or behavioral difference in a future Bun release could cause unexpected CI failures or inconsistent formatting results across different workflow runs. Pin to the latest stable version (currently v1.3.5).

🔎 Recommended fix to pin Bun version
       - name: Setup Bun
         uses: oven-sh/setup-bun@v2
         with:
-          bun-version: latest
+          bun-version: "1.3.5"
 
       - name: Install dependencies
         run: bun install
 
       - name: Format
         run: bun run format
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Setup Bun
uses: oven-sh/setup-bun@v2
with:
node-version: '18'
bun-version: latest
- name: Install dependencies
run: npm install
run: bun install
- name: Format
run: npm run format
\ No newline at end of file
run: bun run format
- name: Setup Bun
uses: oven-sh/setup-bun@v2
with:
bun-version: "1.3.5"
- name: Install dependencies
run: bun install
- name: Format
run: bun run format
🤖 Prompt for AI Agents
In .github/workflows/format.yml around lines 16 to 25, the workflow pins Bun to
"latest" which makes CI non-deterministic; update the uses step to set with:
bun-version: v1.3.5 (the current stable) instead of latest so the workflow
installs a fixed Bun release and produces reproducible builds and formatting
results.

Comment on lines 16 to 25
- name: Setup Bun
uses: oven-sh/setup-bun@v2
with:
node-version: '18'
bun-version: latest

- name: Install dependencies
run: npm install
run: bun install

- name: Lint
run: npm run lint
run: bun run lint
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Pin the Bun version for reproducible CI builds.

Using bun-version: latest can lead to non-deterministic builds when new Bun versions are released. A breaking change or behavioral difference in a future Bun release could cause unexpected CI failures or inconsistent linting results across different workflow runs.

🔎 Recommended fix to pin Bun version
       - name: Setup Bun
         uses: oven-sh/setup-bun@v2
         with:
-          bun-version: latest
+          bun-version: "1.1.38"
 
       - name: Install dependencies
         run: bun install
 
       - name: Lint
         run: bun run lint

Consider using the same pinned version across all workflows for consistency.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Setup Bun
uses: oven-sh/setup-bun@v2
with:
node-version: '18'
bun-version: latest
- name: Install dependencies
run: npm install
run: bun install
- name: Lint
run: npm run lint
\ No newline at end of file
run: bun run lint
- name: Setup Bun
uses: oven-sh/setup-bun@v2
with:
bun-version: "1.1.38"
- name: Install dependencies
run: bun install
- name: Lint
run: bun run lint
🤖 Prompt for AI Agents
In .github/workflows/lint.yml around lines 16 to 25, the workflow currently uses
bun-version: latest; change this to a pinned Bun version (use a concrete semver
string) or reference a repository-wide variable/env that holds the pinned
version, update all other GitHub workflows to use the same pinned value for
consistency, and add a short note in README/CI docs to record the chosen Bun
version and the process for bumping it (bump, test locally/CI, then update the
pinned value).

Comment on lines +33 to +34
ARG _APP_ASSISTANT_OPENAI_API_KEY
ENV _APP_ASSISTANT_OPENAI_API_KEY=${_APP_ASSISTANT_OPENAI_API_KEY}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

echo "=== Checking OPENAI_API_KEY usage in scripts ==="
rg -n "OPENAI_API_KEY" --max-count=20

echo -e "\n=== Checking Dockerfile context around lines 30-35 ==="
head -n 50 Dockerfile | tail -n +25

echo -e "\n=== Checking prod stage (around line 54) ==="
tail -n +45 Dockerfile | head -n 20

echo -e "\n=== Checking fetch-sources script ==="
if [ -f "scripts/fetch-sources.sh" ]; then
  cat scripts/fetch-sources.sh
elif [ -f "scripts/fetch-sources.js" ]; then
  head -n 50 scripts/fetch-sources.js
else
  find . -name "*fetch-sources*" -type f 2>/dev/null | head -5
fi

echo -e "\n=== Checking for build-embeddings references ==="
rg -n "build-embeddings" --max-count=20

Repository: appwrite/assistant

Length of output: 2130


🏁 Script executed:

#!/bin/bash

echo "=== Checking if fetch-sources uses API key ==="
find . -name "fetch-sources*" -type f 2>/dev/null | xargs cat 2>/dev/null | head -50

echo -e "\n=== Checking if API key is actually used in runtime ==="
cat -n src/embeddings.js | head -40

echo -e "\n=== Checking package.json build scripts ==="
grep -A 5 "scripts" package.json | head -20

Repository: appwrite/assistant

Length of output: 2083


Remove the OpenAI API key from build arguments—it's only needed at runtime.

The API key is passed as a build ARG and then set as an ENV, but it's never actually consumed during the build process. Embeddings are generated at runtime in src/embeddings.js, not during Docker image build. The build-time ARG is unnecessary and could expose the key in build history.

Recommendation: Remove lines 33-34 and rely only on the API key being provided at container runtime (which is how it's already reset in the prod stage on line 54).

🤖 Prompt for AI Agents
In Dockerfile around lines 33-34, remove the build ARG and ENV for
_APP_ASSISTANT_OPENAI_API_KEY because the key is not used during image build and
should only be provided at runtime; delete the two lines declaring ARG
_APP_ASSISTANT_OPENAI_API_KEY and ENV
_APP_ASSISTANT_OPENAI_API_KEY=${_APP_ASSISTANT_OPENAI_API_KEY} so the secret is
not leaked into build history and rely on the existing runtime provision (as
done in the prod stage at line 54).

Comment on lines +65 to +66
const body = await req.json();
const { prompt } = body;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing error handling for malformed JSON body.

If req.json() fails (e.g., invalid JSON or missing body), the server will throw an unhandled exception. Consider wrapping in try-catch.

🔎 Proposed fix
-      const body = await req.json();
-      const { prompt } = body;
+      let body;
+      try {
+        body = await req.json();
+      } catch {
+        return new Response("Invalid JSON body", {
+          status: 400,
+          headers: corsHeaders,
+        });
+      }
+      const { prompt } = body;
+
+      if (!prompt) {
+        return new Response("Missing 'prompt' field", {
+          status: 400,
+          headers: corsHeaders,
+        });
+      }
🤖 Prompt for AI Agents
In src/main.js around lines 65 to 66, the call to await req.json() has no error
handling so a malformed or missing JSON body will throw and crash the request;
wrap the parsing in a try-catch, on parse errors return a 400 response with a
clear error message (e.g., "Invalid JSON body"), and after successful parse
validate that prompt exists and is a string, returning 400 if missing or invalid
before proceeding.

Comment on lines +74 to +91
const stream = createStreamingResponse(async (onToken) => {
const chain = await getRagChain(onToken, SYSTEM_PROMPT);
await chain.invoke({
input_documents: relevantDocuments,
question: prompt,
});

if (sources.size > 0) {
onToken("\n\nSources:\n");
for (const sourceUrl of sources) {
onToken("- " + sourceUrl + "\n");
}
}
});

return new Response(stream, {
headers: { ...corsHeaders, "Content-Type": "text/plain" },
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Streaming errors are not propagated to the HTTP response.

If getRagChain or chain.invoke throws, controller.error(error) is called, but this may not translate to a proper HTTP error for the client—the response stream may just terminate abruptly. Consider catching errors within the callback and emitting an error message to the stream before closing, or returning a non-streaming error response for critical failures.

Comment on lines +96 to +97
const body = await req.json();
const { prompt } = body;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Same JSON parsing vulnerability exists here.

The generic prompt endpoint also lacks error handling for malformed JSON, similar to the assistant endpoint. Apply the same fix.

🤖 Prompt for AI Agents
In src/main.js around lines 96-97, the code directly calls await req.json() and
destructures prompt without handling malformed JSON or missing fields; wrap the
JSON parsing and destructuring in a try/catch, return a 400 Bad Request with a
clear error message when JSON.parse fails or when prompt is absent/invalid, and
only proceed when prompt is a non-empty string (matching the fix applied to the
assistant endpoint).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants