Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
360 changes: 158 additions & 202 deletions bun.lock

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "@elizaos/plugin-knowledge",
"description": "Plugin for Knowledge",
"version": "1.5.15",
"version": "1.5.16",
"type": "module",
"main": "dist/index.js",
"module": "dist/index.js",
Expand Down Expand Up @@ -30,7 +30,7 @@
"@ai-sdk/anthropic": "^2.0.17",
"@ai-sdk/google": "^2.0.14",
"@ai-sdk/openai": "^2.0.32",
"@elizaos/core": "^1.6.4",
"@elizaos/core": "^1.7.0",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Search for ProviderRateLimits usage in the codebase
rg "ProviderRateLimits" --type ts --type js -n

Repository: elizaos-plugins/plugin-knowledge

Length of output: 775


🏁 Script executed:

# Look for any breaking change documentation or changelog references
fd -e md | xargs grep -l "@elizaos/core" 2>/dev/null | head -5

Repository: elizaos-plugins/plugin-knowledge

Length of output: 58


🏁 Script executed:

# Check the actual diff context to see what changed in package.json
git diff package.json | head -50

Repository: elizaos-plugins/plugin-knowledge

Length of output: 58


🌐 Web query:

@elizaos/core v1.7.0 changelog breaking changes release notes

💡 Result:

I can't find a published @elizaos/core v1.7.0 release or changelog. The package on npm shows latest published v1.5.7. [1] The elizaOS GitHub releases page lists up through v1.6.5 (alpha) tags but no v1.7.0 release. [2]

If you want, I can:

  • search commit/compare logs for changes planned for v1.7.0 (if in a branch or tag), or
  • watch the repo and notify when v1.7.0 is released.

Sources:
[1] npm: @elizaos/core package page.
[2] GitHub: elizaOS/eliza Releases.


🏁 Script executed:

cat package.json | grep -A 2 -B 2 "@elizaos/core"

Repository: elizaos-plugins/plugin-knowledge

Length of output: 257


🏁 Script executed:

# Check git log to see if there's context about this change
git log --oneline -n 20 -- package.json

Repository: elizaos-plugins/plugin-knowledge

Length of output: 164


🌐 Web query:

@elizaos/core npm package version history latest

💡 Result:

Latest published version: 1.5.8 (published ~7 hours ago). [1][2]

Sources:

  • npm package page showing v1.5.8 as current (last publish ~7 hours ago). [1]
  • Ecosyste.ms registry summary for @elizaos/core (last synced ~1 hour ago). [2]

@elizaos/core v1.7.0 does not exist in npm—version should be updated.

The specified version ^1.7.0 is not a published release. The latest available version is v1.5.8. This will cause npm install to fail. Update the version to an available release (e.g., ^1.5.8) or confirm if v1.7.0 is a pre-release or internal version that should be used instead.

🤖 Prompt for AI Agents
In package.json around line 33, the dependency "@elizaos/core": "^1.7.0"
references a non-published version; update the version to a published release
(for example "^1.5.8") or the correct available tag, then run npm install to
verify; if v1.7.0 is an internal/pre-release, replace the entry with the proper
registry URL or tag and document the dependency choice in the PR.

"@openrouter/ai-sdk-provider": "^1.2.0",
"@tanstack/react-query": "^5.51.1",
"ai": "^5.0.48",
Expand Down
94 changes: 42 additions & 52 deletions src/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,14 @@ export function validateModelConfig(runtime?: IAgentRuntime): ModelConfig {

LOAD_DOCS_ON_STARTUP: parseBooleanEnv(getSetting('LOAD_DOCS_ON_STARTUP')),
CTX_KNOWLEDGE_ENABLED: ctxKnowledgeEnabled,

// Rate limiting settings - disable for fast uploads with APIs without limits
// High defaults optimized for Vercel gateway / high-throughput APIs
RATE_LIMIT_ENABLED: parseBooleanEnv(getSetting('RATE_LIMIT_ENABLED', 'true')),
MAX_CONCURRENT_REQUESTS: getSetting('MAX_CONCURRENT_REQUESTS', '150'),
REQUESTS_PER_MINUTE: getSetting('REQUESTS_PER_MINUTE', '300'),
TOKENS_PER_MINUTE: getSetting('TOKENS_PER_MINUTE', '750000'),
BATCH_DELAY_MS: getSetting('BATCH_DELAY_MS', '100'),
});
validateConfigRequirements(config, assumePluginOpenAI);
return config;
Expand Down Expand Up @@ -181,69 +189,51 @@ function validateConfigRequirements(config: ModelConfig, assumePluginOpenAI: boo
* Returns rate limit information for the configured providers
* Checks BOTH TEXT_PROVIDER (for LLM calls) and EMBEDDING_PROVIDER
*
* Set RATE_LIMIT_ENABLED=false to disable all rate limiting for fast uploads.
* This is useful when using APIs without rate limits (e.g., self-hosted models).
*
* @param runtime The agent runtime to get settings from
* @returns Rate limit configuration for the current providers
*/
export async function getProviderRateLimits(runtime?: IAgentRuntime): Promise<ProviderRateLimits> {
const config = validateModelConfig(runtime);

// Helper function to get setting from runtime or fallback to process.env
const getSetting = (key: string, defaultValue: string) => {
if (runtime) {
return runtime.getSetting(key) || defaultValue;
}
return process.env[key] || defaultValue;
};

// Get rate limit values from runtime settings or use defaults
const maxConcurrentRequests = parseInt(getSetting('MAX_CONCURRENT_REQUESTS', '30'), 10);
const requestsPerMinute = parseInt(getSetting('REQUESTS_PER_MINUTE', '60'), 10);
const tokensPerMinute = parseInt(getSetting('TOKENS_PER_MINUTE', '150000'), 10);
// Get rate limit values from validated config
const rateLimitEnabled = config.RATE_LIMIT_ENABLED;
const maxConcurrentRequests = config.MAX_CONCURRENT_REQUESTS;
const requestsPerMinute = config.REQUESTS_PER_MINUTE;
const tokensPerMinute = config.TOKENS_PER_MINUTE;
const batchDelayMs = config.BATCH_DELAY_MS;

// CRITICAL FIX: Check TEXT_PROVIDER first since that's where rate limits are typically hit
const primaryProvider = config.TEXT_PROVIDER || config.EMBEDDING_PROVIDER;

if (!rateLimitEnabled) {
logger.info(
`[Document Processor] Rate limiting DISABLED - unlimited throughput mode (concurrent: ${maxConcurrentRequests}, batch delay: ${batchDelayMs}ms)`
);
return {
maxConcurrentRequests,
requestsPerMinute: Number.MAX_SAFE_INTEGER,
tokensPerMinute: Number.MAX_SAFE_INTEGER,
provider: primaryProvider || 'unlimited',
rateLimitEnabled: false,
batchDelayMs,
};
}

logger.debug(
`[Document Processor] Rate limiting for ${primaryProvider}: ${requestsPerMinute} RPM, ${tokensPerMinute} TPM, ${maxConcurrentRequests} concurrent`
`[Document Processor] Rate limiting for ${primaryProvider}: ${requestsPerMinute} RPM, ${tokensPerMinute} TPM, ${maxConcurrentRequests} concurrent, ${batchDelayMs}ms batch delay`
);

// Provider-specific rate limits based on actual usage
switch (primaryProvider) {
case 'anthropic':
// Anthropic Claude rate limits - use user settings (they know their tier)
return {
maxConcurrentRequests,
requestsPerMinute,
tokensPerMinute,
provider: 'anthropic',
};

case 'openai':
// OpenAI typically allows 150,000 tokens per minute for embeddings
// and up to 3,000 RPM for Tier 4+ accounts
return {
maxConcurrentRequests,
requestsPerMinute: Math.min(requestsPerMinute, 3000),
tokensPerMinute: Math.min(tokensPerMinute, 150000),
provider: 'openai',
};

case 'google':
// Google's default is 60 requests per minute
return {
maxConcurrentRequests,
requestsPerMinute: Math.min(requestsPerMinute, 60),
tokensPerMinute: Math.min(tokensPerMinute, 100000),
provider: 'google',
};

default:
// Use user-configured values for unknown providers
return {
maxConcurrentRequests,
requestsPerMinute,
tokensPerMinute,
provider: primaryProvider || 'unknown',
};
}
// Use user-configured values directly - trust the user's gateway/API setup
// No provider-specific caps to allow high-throughput setups (e.g., Vercel gateway)
return {
maxConcurrentRequests,
requestsPerMinute,
tokensPerMinute,
provider: primaryProvider || 'unknown',
rateLimitEnabled: true,
batchDelayMs,
};
}
63 changes: 45 additions & 18 deletions src/document-processor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,10 @@ function getCtxKnowledgeEnabled(runtime?: IAgentRuntime): boolean {
let rawValue: string | undefined;

if (runtime) {
rawValue = runtime.getSetting('CTX_KNOWLEDGE_ENABLED');
const settingValue = runtime.getSetting('CTX_KNOWLEDGE_ENABLED');
rawValue = typeof settingValue === 'string' ? settingValue : settingValue?.toString();
// CRITICAL FIX: Use trim() and case-insensitive comparison
const cleanValue = rawValue?.toString().trim().toLowerCase();
const cleanValue = rawValue?.trim().toLowerCase();
result = cleanValue === 'true';
source = 'runtime.getSetting()';
} else {
Expand Down Expand Up @@ -146,16 +147,23 @@ export async function processFragmentsSynchronously({
logger.info(`[Document Processor] "${docName}": Split into ${chunks.length} chunks`);

// Get provider limits for rate limiting
const providerLimits = await getProviderRateLimits();
const CONCURRENCY_LIMIT = Math.min(30, providerLimits.maxConcurrentRequests || 30);
const providerLimits = await getProviderRateLimits(runtime);
const CONCURRENCY_LIMIT = providerLimits.maxConcurrentRequests || 30;
const rateLimiter = createRateLimiter(
providerLimits.requestsPerMinute || 60,
providerLimits.tokensPerMinute
providerLimits.tokensPerMinute,
providerLimits.rateLimitEnabled
);

logger.debug(
`[Document Processor] Rate limits: ${providerLimits.requestsPerMinute} RPM, ${providerLimits.tokensPerMinute} TPM (${providerLimits.provider}, concurrency: ${CONCURRENCY_LIMIT})`
);
if (!providerLimits.rateLimitEnabled) {
logger.info(
`[Document Processor] UNLIMITED MODE: concurrency ${CONCURRENCY_LIMIT}, batch delay ${providerLimits.batchDelayMs}ms`
);
} else {
logger.debug(
`[Document Processor] Rate limits: ${providerLimits.requestsPerMinute} RPM, ${providerLimits.tokensPerMinute} TPM (${providerLimits.provider}, concurrency: ${CONCURRENCY_LIMIT})`
);
}

// Process and save fragments
const { savedCount, failedCount } = await processAndSaveFragments({
Expand All @@ -171,6 +179,7 @@ export async function processFragmentsSynchronously({
concurrencyLimit: CONCURRENCY_LIMIT,
rateLimiter,
documentTitle,
batchDelayMs: providerLimits.batchDelayMs,
});

// Report results with summary
Expand Down Expand Up @@ -352,6 +361,7 @@ async function processAndSaveFragments({
concurrencyLimit,
rateLimiter,
documentTitle,
batchDelayMs = 500,
}: {
runtime: IAgentRuntime;
documentId: UUID;
Expand All @@ -365,6 +375,7 @@ async function processAndSaveFragments({
concurrencyLimit: number;
rateLimiter: (estimatedTokens?: number) => Promise<void>;
documentTitle?: string;
batchDelayMs?: number;
}): Promise<{
savedCount: number;
failedCount: number;
Expand Down Expand Up @@ -460,9 +471,10 @@ async function processAndSaveFragments({
}
}

// Add a small delay between batches to prevent overwhelming the API
if (i + concurrencyLimit < chunks.length) {
await new Promise((resolve) => setTimeout(resolve, 500));
// Add a configurable delay between batches to prevent overwhelming the API
// Set BATCH_DELAY_MS=0 to disable delay for maximum throughput
if (i + concurrencyLimit < chunks.length && batchDelayMs > 0) {
await new Promise((resolve) => setTimeout(resolve, batchDelayMs));
}
}

Expand Down Expand Up @@ -623,10 +635,11 @@ async function generateContextsInBatch(
return [];
}

const providerLimits = await getProviderRateLimits();
const providerLimits = await getProviderRateLimits(runtime);
const rateLimiter = createRateLimiter(
providerLimits.requestsPerMinute || 60,
providerLimits.tokensPerMinute
providerLimits.tokensPerMinute,
providerLimits.rateLimitEnabled
);

// Get active provider from validateModelConfig
Expand Down Expand Up @@ -685,11 +698,13 @@ async function generateContextsInBatch(
} else {
// Fall back to runtime.useModel (original behavior)
if (item.usesCaching) {
// Use the newer caching approach with separate document
// Note: runtime.useModel doesn't support cacheDocument/cacheOptions
// Use the newer caching approach - embed system prompt into main prompt
// Note: runtime.useModel doesn't support separate system prompt
const combinedPrompt = item.systemPrompt
? `${item.systemPrompt}\n\n${item.promptText!}`
: item.promptText!;
return await runtime.useModel(ModelType.TEXT_LARGE, {
prompt: item.promptText!,
system: item.systemPrompt,
prompt: combinedPrompt,
});
} else {
// Original approach - document embedded in prompt
Expand Down Expand Up @@ -897,13 +912,25 @@ async function withRateLimitRetry<T>(

/**
* Creates a comprehensive rate limiter that tracks both requests and tokens
* @param requestsPerMinute Maximum requests per minute
* @param tokensPerMinute Maximum tokens per minute (optional)
* @param rateLimitEnabled If false, rate limiting is completely disabled for maximum throughput
*/
function createRateLimiter(requestsPerMinute: number, tokensPerMinute?: number) {
function createRateLimiter(
requestsPerMinute: number,
tokensPerMinute?: number,
rateLimitEnabled: boolean = true
) {
const requestTimes: number[] = [];
const tokenUsage: Array<{ timestamp: number; tokens: number }> = [];
const intervalMs = 60 * 1000; // 1 minute in milliseconds

return async function rateLimiter(estimatedTokens: number = 1000) {
// Skip all rate limiting if disabled - maximum throughput mode
if (!rateLimitEnabled) {
return;
}

const now = Date.now();

// Remove old timestamps
Expand Down
3 changes: 2 additions & 1 deletion src/service.ts
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,8 @@ export class KnowledgeService extends Service {
await new Promise((resolve) => setTimeout(resolve, 1000));

// Get the agent-specific knowledge path from runtime settings
const knowledgePath = this.runtime.getSetting('KNOWLEDGE_PATH');
const knowledgePathSetting = this.runtime.getSetting('KNOWLEDGE_PATH');
const knowledgePath = typeof knowledgePathSetting === 'string' ? knowledgePathSetting : undefined;

const result: LoadResult = await loadDocsFromPath(
this as any,
Expand Down
45 changes: 44 additions & 1 deletion src/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,40 @@ export const ModelConfigSchema = z.object({

// Contextual Knowledge settings
CTX_KNOWLEDGE_ENABLED: z.boolean().default(false),

// Rate limiting settings
// Set RATE_LIMIT_ENABLED=false to disable all rate limiting for fast uploads
// Useful when using APIs without rate limits (e.g., self-hosted models)
// High defaults optimized for Vercel gateway / high-throughput APIs
RATE_LIMIT_ENABLED: z.boolean().default(true),

// Maximum concurrent requests (default: 150, set higher for faster processing)
MAX_CONCURRENT_REQUESTS: z
.string()
.or(z.number())
.optional()
.transform((val) => (val ? (typeof val === 'string' ? parseInt(val, 10) : val) : 150)),

// Requests per minute limit (default: 300)
REQUESTS_PER_MINUTE: z
.string()
.or(z.number())
.optional()
.transform((val) => (val ? (typeof val === 'string' ? parseInt(val, 10) : val) : 300)),

// Tokens per minute limit (default: 750000)
TOKENS_PER_MINUTE: z
.string()
.or(z.number())
.optional()
.transform((val) => (val ? (typeof val === 'string' ? parseInt(val, 10) : val) : 750000)),

// Delay between batches in milliseconds (default: 100, set to 0 for no delay)
BATCH_DELAY_MS: z
.string()
.or(z.number())
.optional()
.transform((val) => (val ? (typeof val === 'string' ? parseInt(val, 10) : val) : 100)),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schema transform incorrectly handles numeric zero for batch delay

The BATCH_DELAY_MS schema transform uses val ? ... : 100 which treats the number 0 as falsy, incorrectly converting it to 100. The comment on line 82 explicitly states "set to 0 for no delay" and the schema accepts numbers via .or(z.number()), but passing the number 0 directly would return 100 instead of 0. While environment variable usage (string "0") works correctly since non-empty strings are truthy, direct programmatic use with the number 0 would fail silently. The fix would be to use val !== undefined && val !== null && val !== '' instead of the truthy check.

Fix in Cursor Fix in Web

});

export type ModelConfig = z.infer<typeof ModelConfigSchema>;
Expand All @@ -67,6 +101,10 @@ export interface ProviderRateLimits {
tokensPerMinute?: number;
// Name of the provider
provider: string;
// Whether rate limiting is enabled (false = unlimited throughput)
rateLimitEnabled: boolean;
// Delay between batches in milliseconds (0 = no delay)
batchDelayMs: number;
}

/**
Expand Down Expand Up @@ -160,7 +198,12 @@ export interface KnowledgeConfig {
EMBEDDING_PROVIDER?: string;
TEXT_PROVIDER?: string;
TEXT_EMBEDDING_MODEL?: string;
// Add any other plugin-specific configurations
// Rate limiting configuration
RATE_LIMIT_ENABLED?: boolean;
MAX_CONCURRENT_REQUESTS?: number;
REQUESTS_PER_MINUTE?: number;
TOKENS_PER_MINUTE?: number;
BATCH_DELAY_MS?: number;
}

export interface LoadResult {
Expand Down