feat: v1.1 Quality Hardening — real metrics, pattern decay, graph redesign#25
Conversation
- Map API key_configured to VaultStatus.encrypted in fetchVaultStatus - Add created_at field to vault credential list response
- Add clipboard.ts utility with Clipboard API + execCommand fallback - Replace navigator.clipboard calls in TokensView and VaultView - Show revoked tokens dimmed with red badge, sorted to bottom - Hide revoke button for already-revoked tokens
Prefer result.results array over result.observations for inject response, using actual returned count instead of requested limit.
Replace hardcoded zeros in vector metrics endpoint with real instrumentation data from an atomic ring buffer in pgvector client. - Add queryCount, queryLatencyNs, latencyBuf atomics to Client struct - Instrument Query() with timing defer - Add GetMetrics() returning VectorMetricsSnapshot (count, avg, p50/p95/p99, docs) - Rewrite handleVectorMetrics to return only real data - Update VectorMetrics TypeScript type and SystemView.vue to match new shape
- T004: Remove unused scope param from engram-search.ts Zod schema - T005: Remove conflictStore/relationStore fields from ObservationStore (RelationDetector is wired separately via SetRelationDetector) - T006: Replace days*50 heuristic limit with SQL WHERE created_at_epoch cutoff - T007: Replace node count estimate (edges*1.5) with real SQL UNION query - T008: Replace silent chunk drop with log.Warn including file path and size - T009: Create TECHNICAL_DEBT.md for legitimate deferred items - T010: Remove stale "will be implemented" comment (already implemented)
Replace manual observation ID input with automatic graph loading. On page load, fetches top 50 observations and their relations, builds a vis-network graph with type-colored nodes sized by importance score. Adds filter sidebar for relation types and project dropdown, floating tooltip on hover, and graph stats summary showing node/edge counts and relation type distribution.
Add client-side search filter, sort selector (frequency/confidence/ last seen), items-per-page control (20/50/100), and Pagination component. Add header explanation text and confidence bar display. Backend handler now accepts offset/sort query params.
T019: Raise detection thresholds — MinMatchScore 0.3->0.5, MinFrequencyForPattern 2->5 to reduce noise. T020: Add QualityScore function combining log-scaled frequency (30%), sigmoid confidence (50%), and project diversity (20%). T021: Add HybridDecay with 7-day grace, Gaussian decay 7-60 days, linear terminal 60-90 days. RunDecay deprecates patterns below 0.10 dynamic quality. T022: Add POST /api/maintenance/purge-patterns endpoint with dry_run preview mode for bulk pattern deprecation. T023: Integrate pattern decay into maintenance service — runs as Task 7 during scheduled maintenance cycles.
ПрогулкаОбновлены внутренние хранилища и API: удалены неиспользуемые зависимости из Изменения
Диаграмма последовательностиsequenceDiagram
participant MaintService as Maintenance Service
participant PatternStore as Pattern Store
participant PatternDecay as pattern.RunDecay()
participant DB as Database
MaintService->>PatternStore: GetActivePatterns(ctx, 10000)
PatternStore->>DB: Query active patterns
DB-->>PatternStore: patterns[]
PatternStore-->>MaintService: patterns[]
loop For each pattern
MaintService->>MaintService: QualityScore(freq, conf, projCount, maxFreq)
MaintService->>MaintService: HybridDecay(lastSeenAt)
MaintService->>MaintService: dynamicQuality = baseQuality × decayMultiplier
alt dynamicQuality < 0.10
MaintService->>PatternStore: MarkPatternDeprecated(patternID)
PatternStore->>DB: UPDATE patterns SET deprecated=true
MaintService->>MaintService: deprecated++
end
end
MaintService-->>MaintService: totalPatternDecay += deprecated
MaintService-->>MaintService: Log completion stats
sequenceDiagram
participant Client as Vector Client
participant QueryTracker as Query Instrumentation
participant RingBuffer as Ring Buffer (1000 samples)
participant Storage as Database
Client->>QueryTracker: Query(ctx, query)
QueryTracker->>QueryTracker: t0 = time.Now()
QueryTracker->>Storage: Execute vector search
Storage-->>QueryTracker: results
QueryTracker->>QueryTracker: duration = time.Since(t0)
QueryTracker->>QueryTracker: queryCount++
QueryTracker->>QueryTracker: totalLatency += duration
QueryTracker->>RingBuffer: Append latency sample
QueryTracker-->>Client: results
Client->>QueryTracker: GetMetrics(ctx)
QueryTracker->>RingBuffer: Read up to 1000 samples
QueryTracker->>QueryTracker: Sort samples, compute p50/p95/p99
QueryTracker->>QueryTracker: avgLatency = totalLatency / queryCount
QueryTracker->>Storage: Count()
Storage-->>QueryTracker: totalDocs
QueryTracker-->>Client: VectorMetricsSnapshot{queryCount, avgLatencyMs, p50, p95, p99, totalDocs}
Оценка затрат на код-ревью🎯 4 (Complex) | ⏱️ ~75 минут Возможно связанные MR
Предлагаемые метки
Поэма 🐰
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.3)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the system's quality and user experience by replacing stubbed data with real-time metrics, introducing a robust pattern quality and decay mechanism, and redesigning the knowledge graph for better interactivity and data representation. It also addresses several dashboard-related UX issues and eliminates various hardcoded values and approximations, leading to more accurate and reliable system behavior. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request refactors the ObservationStore constructor to remove unused conflictStore and relationStore parameters, updating all call sites accordingly. It introduces a new GetObservationsSinceEpoch method for efficient time-based observation retrieval, replacing a heuristic-based approach in analytics handlers. The RelationStore gains a GetDistinctNodeCount method for accurate graph node counting, which is now used in the UI's graph stats. Vector metrics have been enhanced with real-time query instrumentation, including latency percentiles, replacing previous placeholder values. A new pattern decay mechanism is implemented, including HybridDecay and RunDecay functions, to deprecate low-quality patterns, and this is integrated into the maintenance service and exposed via a new API endpoint. The pattern detection configuration now uses higher thresholds for MinMatchScore and MinFrequencyForPattern. The UI for patterns has been significantly improved with client-side filtering, sorting, and pagination, and new pattern types have been added. The graph view in the UI has been overhauled to dynamically fetch and render relations for top observations, including node sizing based on importance, relation type filtering, and a hover tooltip for node details. Additionally, a new copyToClipboard utility has been added to the UI, and credential listings now include CreatedAt information.
| if m.options.MaxChunkSize > 0 && len(chunk.Content) > m.options.MaxChunkSize { | ||
| // TODO: Consider splitting large chunks intelligently | ||
| // For now, skip chunks that are too large | ||
| log.Warn().Str("file", chunk.FilePath).Int("size", len(chunk.Content)).Int("max", m.options.MaxChunkSize).Msg("Chunk exceeds MaxChunkSize, skipping") |
| lastSeen, _ = time.Parse(time.RFC3339, p.LastSeenAt) | ||
| } | ||
| if lastSeen.IsZero() && p.CreatedAt != "" { | ||
| lastSeen, _ = time.Parse(time.RFC3339, p.CreatedAt) | ||
| } |
There was a problem hiding this comment.
The logic for determining lastSeen is robust, handling cases where LastSeenAt or CreatedAt might be empty. However, if both LastSeenAt and CreatedAt are empty, lastSeen will remain time.IsZero(), which might lead to unexpected decay behavior if not explicitly handled. While HybridDecay will return 1.0 for ageDays <= offset (which would be true for a zero time relative to time.Now()), it's good practice to ensure lastSeen is always a valid, non-zero time if possible, perhaps defaulting to time.Now() or a minimum valid time if both fields are absent.
| sFreq := math.Log10(1+float64(frequency)) / math.Log10(1+maxF) | ||
|
|
||
| // Sigmoid confidence centered at 0.5 (50% weight) | ||
| k := 10.0 |
There was a problem hiding this comment.
The constant k := 10.0 is used in the sigmoid function. While it's a constant within this function, it acts as a 'magic number' that controls the steepness of the sigmoid curve. For clarity and easier tuning, it might be beneficial to define this as a named constant (e.g., sigmoidSteepnessFactor) at the package level or within the function, with a brief comment explaining its purpose.
| samples := make([]int64, n) | ||
| // Read the most recent n entries from the ring buffer. | ||
| // latencyIdx is the next-write position; the most recent n entries end at latencyIdx-1. | ||
| startIdx := c.latencyIdx.Load() - n |
There was a problem hiding this comment.
The calculation for startIdx could potentially be negative if c.latencyIdx.Load() is less than n. While idx%latencyBufSize handles negative numbers in Go by producing a negative result (e.g., -1 % 1000 is -1), this could lead to an out-of-bounds access if latencyBuf is indexed with a negative number. It's safer to ensure startIdx is non-negative before the modulo operation, for example, by adding latencyBufSize until it's positive: startIdx := (c.latencyIdx.Load() - n + latencyBufSize) % latencyBufSize or startIdx := (c.latencyIdx.Load() - n). If startIdx is negative, (startIdx % latencyBufSize + latencyBufSize) % latencyBufSize would ensure a positive index. However, given the ring buffer nature, (startIdx + i) % latencyBufSize might already handle it correctly if startIdx is negative but startIdx + i eventually becomes positive within the loop. Still, explicitly handling negative startIdx for clarity and safety is a good practice.
| // Get all active patterns | ||
| patterns, err = store.GetActivePatterns(r.Context(), limit) | ||
| // Get all active patterns with offset support | ||
| patterns, err = store.GetActivePatterns(r.Context(), limit+offset) |
There was a problem hiding this comment.
Fetching limit + offset patterns from the database and then manually slicing them in application code (patterns = patterns[offset:] and patterns = patterns[:limit]) can be inefficient for very large datasets or high offset values. It's generally more efficient to push the offset and limit directly to the database query using GORM's Offset() and Limit() methods. This allows the database to handle the pagination, reducing data transfer and memory usage in the application.
| if (!nodeMap.has(r.relation.source_id)) { | ||
| nodeMap.set(r.relation.source_id, { | ||
| id: r.relation.source_id, | ||
| label: r.source_title || `#${r.relation.source_id}`, | ||
| type: r.source_type, | ||
| score: 0.5, | ||
| }) | ||
| } | ||
| if (!nodeMap.has(r.relation.target_id)) { | ||
| nodeMap.set(r.relation.target_id, { | ||
| id: r.relation.target_id, | ||
| label: r.target_title || `#${r.relation.target_id}`, | ||
| type: r.target_type, | ||
| score: 0.5, | ||
| }) | ||
| } |
There was a problem hiding this comment.
The logic for adding nodes from relations that might not be in the top observations (lines 103-118) assigns a default score: 0.5 to these nodes. While this provides a fallback, it might not accurately reflect the actual importance of these indirectly connected observations. Consider fetching the actual importance score for these nodes if possible, or at least documenting this default value's implication.
| stabilization: { iterations: 150 }, | ||
| barnesHut: { gravitationalConstant: -4000, springLength: 180, springConstant: 0.04 }, |
There was a problem hiding this comment.
The stabilization iterations for the physics engine are set to 150, and barnesHut parameters are adjusted. While these values might work well for typical graph sizes, they can significantly impact performance for very large graphs. It might be beneficial to make these physics parameters configurable, perhaps based on the number of nodes/edges, or expose them as settings in the UI for advanced users to tune.
| <div v-if="hasGraph" class="grid grid-cols-2 md:grid-cols-4 gap-4 mb-4"> | ||
| <div class="p-3 rounded-xl border-2 border-slate-700/50 bg-gradient-to-br from-slate-800/50 to-slate-900/50"> | ||
| <span class="text-xs text-slate-500 block">Nodes</span> | ||
| <span class="text-lg font-bold text-white font-mono">{{ graphStats.nodeCount }}</span> | ||
| <span class="text-lg font-bold text-white font-mono">{{ nodeCount }}</span> | ||
| </div> | ||
| <div class="p-3 rounded-xl border-2 border-slate-700/50 bg-gradient-to-br from-slate-800/50 to-slate-900/50"> | ||
| <span class="text-xs text-slate-500 block">Edges</span> | ||
| <span class="text-lg font-bold text-white font-mono">{{ graphStats.edgeCount }}</span> | ||
| </div> | ||
| <div class="p-3 rounded-xl border-2 border-slate-700/50 bg-gradient-to-br from-slate-800/50 to-slate-900/50"> | ||
| <span class="text-xs text-slate-500 block">Avg Degree</span> | ||
| <span class="text-lg font-bold text-white font-mono">{{ graphStats.avgDegree.toFixed(1) }}</span> | ||
| <span class="text-lg font-bold text-white font-mono">{{ edgeCount }}</span> | ||
| </div> | ||
| <div class="p-3 rounded-xl border-2 border-slate-700/50 bg-gradient-to-br from-slate-800/50 to-slate-900/50"> | ||
| <span class="text-xs text-slate-500 block">Max Degree</span> | ||
| <span class="text-lg font-bold text-white font-mono">{{ graphStats.maxDegree }}</span> | ||
| <div | ||
| v-for="(count, rtype) in relationTypeCounts" | ||
| :key="rtype" | ||
| class="p-3 rounded-xl border-2 border-slate-700/50 bg-gradient-to-br from-slate-800/50 to-slate-900/50" | ||
| > | ||
| <span class="text-xs text-slate-500 block capitalize">{{ String(rtype).replace(/_/g, ' ') }}</span> | ||
| <span class="text-lg font-bold text-white font-mono">{{ count }}</span> | ||
| </div> |
There was a problem hiding this comment.
The graph stats summary now includes nodeCount, edgeCount, and counts for each relationType. This is a great improvement over the previous approximation. For completeness, consider adding the average degree and max degree metrics back, as they provide valuable insights into the graph's structure and density, which were present in the previous graphStats.
| const PATTERN_TYPE_CONFIG: Record<string, { colorClass: string; bgClass: string; borderClass: string; icon: string }> = { | ||
| workflow: { colorClass: 'text-violet-300', bgClass: 'bg-violet-500/20', borderClass: 'border-violet-500/30', icon: 'fa-diagram-project' }, | ||
| best_practice: { colorClass: 'text-emerald-300', bgClass: 'bg-emerald-500/20', borderClass: 'border-emerald-500/30', icon: 'fa-check-circle' }, | ||
| anti_pattern: { colorClass: 'text-red-300', bgClass: 'bg-red-500/20', borderClass: 'border-red-500/30', icon: 'fa-ban' }, | ||
| bug: { colorClass: 'text-orange-300', bgClass: 'bg-orange-500/20', borderClass: 'border-orange-500/30', icon: 'fa-bug' }, | ||
| refactor: { colorClass: 'text-blue-300', bgClass: 'bg-blue-500/20', borderClass: 'border-blue-500/30', icon: 'fa-rotate' }, | ||
| architecture: { colorClass: 'text-cyan-300', bgClass: 'bg-cyan-500/20', borderClass: 'border-cyan-500/30', icon: 'fa-building' }, |
There was a problem hiding this comment.
There was a problem hiding this comment.
Actionable comments posted: 13
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
internal/pattern/detector_test.go (1)
334-339:⚠️ Potential issue | 🟠 MajorТест всё ещё фиксирует старые дефолтные пороги.
По целям этого PR дефолты должны быть
MinMatchScore = 0.5иMinFrequencyForPattern = 5, а здесь по-прежнему ожидаются0.3и2. Если реализация уже обновлена, этот тест будет падать; если нет — он закрепляет старое поведение вместо quality hardening.💡 Возможная правка
- if config.MinMatchScore != 0.3 { - t.Errorf("MinMatchScore = %f, want 0.3", config.MinMatchScore) + if config.MinMatchScore != 0.5 { + t.Errorf("MinMatchScore = %f, want 0.5", config.MinMatchScore) } - if config.MinFrequencyForPattern != 2 { - t.Errorf("MinFrequencyForPattern = %d, want 2", config.MinFrequencyForPattern) + if config.MinFrequencyForPattern != 5 { + t.Errorf("MinFrequencyForPattern = %d, want 5", config.MinFrequencyForPattern) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/pattern/detector_test.go` around lines 334 - 339, The test is asserting outdated defaults: update the expectations for config.MinMatchScore and config.MinFrequencyForPattern in detector_test.go to the new defaults (expect MinMatchScore == 0.5 and MinFrequencyForPattern == 5); locate the assertions referencing config.MinMatchScore and config.MinFrequencyForPattern and change the expected literal values (or, better, compare against the canonical defaults/constants used by the implementation if available) so the test matches the new default behavior.internal/worker/handlers_patterns.go (2)
63-72:⚠️ Potential issue | 🟡 MinorНесогласованная поддержка offset для разных фильтров.
Offset применяется только для ветки без фильтров (
GetActivePatterns). При фильтрации поtypeилиproject(строки 65, 68) offset игнорируется, что приводит к несогласованному поведению API.♻️ Предложенное исправление
if patternType != "" { // Filter by type - patterns, err = store.GetPatternsByType(r.Context(), models.PatternType(patternType), limit) + patterns, err = store.GetPatternsByType(r.Context(), models.PatternType(patternType), limit+offset) } else if project != "" { // Filter by project - patterns, err = store.GetPatternsByProject(r.Context(), project, limit) + patterns, err = store.GetPatternsByProject(r.Context(), project, limit+offset) } else {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/worker/handlers_patterns.go` around lines 63 - 72, The branch logic ignores offset for filtered queries (patternType and project) causing inconsistent pagination; update the calls to use offset the same way as the unfiltered path by passing offset to the store methods: modify uses of store.GetPatternsByType and store.GetPatternsByProject to accept/apply offset (either by passing limit+offset like the GetActivePatterns call or by updating those methods to accept separate limit and offset parameters) so pagination is consistent across patternType, project and the unfiltered GetActivePatterns path.
22-25:⚠️ Potential issue | 🟡 MinorSwagger-документация не описывает новые параметры.
Параметры
offsetиsortне задокументированы в godoc-комментариях, хотя они принимаются API. Это приведёт к неполной OpenAPI-спецификации.📝 Предложенное исправление
// `@Param` type query string false "Filter by pattern type" // `@Param` project query string false "Filter by project" // `@Param` limit query int false "Number of results (default 100)" +// `@Param` offset query int false "Number of results to skip (default 0)" +// `@Param` sort query string false "Sort order: frequency, confidence, recency (reserved for future use)" // `@Success` 200 {array} models.Pattern🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/worker/handlers_patterns.go` around lines 22 - 25, The Swagger godoc block that documents the patterns endpoint (the comment containing `@Param` type, `@Param` project, `@Param` limit and `@Success` 200 {array} models.Pattern) is missing the offset and sort parameters; add two `@Param` lines: one for offset (query int false "Pagination offset (default 0)") and one for sort (query string false "Sort order, e.g. 'created_at:desc' or 'name:asc'"), matching the existing style so OpenAPI includes these query params for the patterns handler.
🧹 Nitpick comments (4)
TECHNICAL_DEBT.md (1)
8-11: Рассмотрите удаление неиспользуемой таблицы или повышение приоритета её заполнения.Наличие пустой таблицы
memory_blocks, которая создана миграцией, но никогда не заполняется, является техническим долгом с недооценённым влиянием:
- Увеличивает сложность схемы базы данных
- Может ввести в заблуждение будущих разработчиков
- Занимает пространство имён схемы без пользы
Если заполнение таблицы не планируется в ближайшем будущем, следует рассмотреть возможность удаления таблицы через обратную миграцию. Альтернативно, если функциональность действительно необходима, повысьте приоритет доработки планировщика консолидации.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TECHNICAL_DEBT.md` around lines 8 - 11, The migration-created table memory_blocks (migration 024) is never populated; either remove it with a rollback migration or implement its population by prioritizing the consolidation scheduler; to fix, choose one: (A) add a new rollback migration that drops the memory_blocks table and remove any references to it from code and schema docs (refer to migration 024 and any schema registration code), or (B) raise work priority and implement the consolidation scheduler changes that insert into memory_blocks during consolidation flows (identify the consolidation scheduler module/class and the insertion logic that should populate memory_blocks), then add tests that validate memory_blocks is populated.ui/src/utils/api.ts (1)
501-508: LGTM!Корректный маппинг между форматом ответа backend (
key_configured,fingerprint) и интерфейсом frontend (encrypted,key_fingerprint). Использование fallback-значений (?? false,?? 0) обеспечивает устойчивость к неполным ответам.Для улучшения типобезопасности можно определить интерфейс для raw-ответа:
♻️ Опциональное улучшение
interface RawVaultStatus { key_configured?: boolean fingerprint?: string key_fingerprint?: string credential_count?: number } export async function fetchVaultStatus(signal?: AbortSignal): Promise<VaultStatus> { const raw = await fetchWithRetry<RawVaultStatus>(`${API_BASE}/vault/status`, { signal }) return { encrypted: raw.key_configured ?? false, key_fingerprint: raw.fingerprint ?? raw.key_fingerprint, credential_count: raw.credential_count ?? 0, } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ui/src/utils/api.ts` around lines 501 - 508, Add a typed interface for the backend response and use it with fetchWithRetry to improve type safety: define an interface RawVaultStatus with optional fields key_configured, fingerprint, key_fingerprint, and credential_count, then change the fetch call in fetchVaultStatus to fetchWithRetry<RawVaultStatus>(...) so `raw` is strongly typed while keeping the existing mapping to VaultStatus.internal/worker/handlers_maintenance.go (1)
191-233: Дублирование логики расчёта качества паттернов.Логика вычисления
QualityScoreиHybridDecayв dry-run режиме (строки 200-225) дублирует код изpattern.RunDecay(internal/pattern/decay.go, строки 45-91). Это создаёт риск рассинхронизации при будущих изменениях порога или формулы.Рассмотрите возможность вынести preview-логику в отдельную функцию в пакете
pattern, напримерPreviewDecay(ctx, store) (total, wouldDeprecate int, error).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/worker/handlers_maintenance.go` around lines 191 - 233, The dry-run block duplicates the quality/decay calculation in pattern.RunDecay by reusing pattern.QualityScore and pattern.HybridDecay inline; extract that preview logic into a single function in the pattern package (e.g., PreviewDecay(ctx context.Context, s store.Interface) (totalActive int, wouldDeprecate int, err error)) that encapsulates fetching active patterns, computing maxFreq, baseQuality, lastSeen parsing, decayMultiplier and the <0.10 threshold, and returning counts and error; then replace the inline dry-run loop in handlers_maintenance.go with a call to pattern.PreviewDecay and propagate errors to writeJSON/http.Error as appropriate so the handler no longer duplicates RunDecay logic.internal/pattern/decay.go (1)
47-52: Рассмотрите пакетную обработку для больших объёмов данных.Загрузка до 10,000 паттернов в память одним запросом может быть ресурсоёмкой. Для production-систем с большим количеством паттернов рекомендуется рассмотреть пакетную обработку с курсором или пагинацией.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/pattern/decay.go` around lines 47 - 52, Текущее чтение всех паттернов через patternStore.GetActivePatterns с maxPatterns=10000 загружает всё в память; замените это на пакетную обработку: измените вызов в decay.go (убрать константу maxPatterns) и итерируйте по страницам, вызывая patternStore.GetActivePatterns (или добавить новый метод типа ListActivePatterns(ctx, limit, offset/cursor)) в цикле с ограничением batchSize, обрабатывайте/накладывайте decay для каждой порции и прекращайте, когда возвращается пустой срез; сохраняйте существующую логику обработки паттернов, но перенесите её внутрь цикла и обеспечьте использование курсора/offset или следующего токена для устойчивой работы с большими объёмами.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/db/gorm/observation_store.go`:
- Around line 576-591: Update GetObservationsSinceEpoch on ObservationStore to
accept a limit parameter and apply it to the DB query (use .Limit(limit)) while
preserving the existing Where, Scopes (projectScopeFilter(project),
importanceOrdering()) and ordering behavior; change the function signature, add
validation or a sane default if needed, apply .Limit before
Find(&dbObservations), and update all callers to pass the new limit so
toModelObservations(dbObservations) still returns the same model slice.
In `@internal/pattern/decay.go`:
- Around line 70-77: The parsing of timestamps in HybridDecay currently ignores
errors (parsing p.LastSeenAt and p.CreatedAt) which can leave lastSeen as
time.Time{} and cause immediate deprecation; update the parsing code in
internal/pattern/decay.go to check the error returned by time.Parse when parsing
p.LastSeenAt and p.CreatedAt, log the parse error (use the package logger or the
appropriate logger in scope) and treat a parse failure as "timestamp
unavailable" rather than silently using the zero time—fallback to the other
field if available, and if both fail set lastSeen to time.Now() (or a safe
default) so patterns are not immediately deprecated; ensure you touch the
parsing logic around lastSeen, p.LastSeenAt, p.CreatedAt and HybridDecay to
implement these checks and logging.
- Around line 81-88: The hardcoded 0.10 threshold and immediate deprecation are
too aggressive: make the threshold configurable (e.g., env var
ENGRAM_DECAY_THRESHOLD used by RunDecay()), add a tolerance and
consecutive-failure requirement (track per-pattern consecutiveLowQuality count
in the pattern record or an in-memory map and require N consecutive runs before
calling patternStore.MarkPatternDeprecated), and introduce an intermediate
"warning" state (e.g., patternStore.MarkPatternWarning(ctx, p.ID) or a status
field) to be set on the first breach so only repeated breaches lead to
deprecation; update logic around dynamicQuality, p.ID, deprecated and
patternStore.MarkPatternDeprecated accordingly.
In `@internal/vector/pgvector/client.go`:
- Around line 393-408: GetMetrics currently reads from latencyBuf concurrently
with writes in Query causing a Go data race; convert latencyBuf to a slice of
atomic.Int64 (or protect with a mutex) and update all writes in Query to use
Store on latencyBuf[(i)%latencyBufSize] and all reads in GetMetrics to use Load
when building samples; keep existing logic for computing startIdx from
latencyIdx and modulo by latencyBufSize, and ensure allocations (samples :=
make([]int64, n)) and sorts remain unchanged so the metrics calculation
(p50/p95/p99) uses the loaded atomic values.
- Around line 143-150: The defer that writes dur into c.latencyBuf using idx
from c.latencyIdx and the GetMetrics() reader access the same buffer without
synchronization (symbols: c.latencyBuf, c.latencyIdx, GetMetrics), causing a
read-write race; also the struct field mu sync.RWMutex is declared but never
used. Fix by either (A) making the behavior explicitly documented as best-effort
and removing mu, or (B) using mu (or a dedicated mutex) to protect writes and
reads: in the deferred function lock for write (mu.Lock/Unlock) around the
c.latencyBuf[...] assignment and increment of c.latencyIdx, and in GetMetrics()
use mu.RLock/RUnlock while reading the buffer and computing metrics; ensure
queryCount/queryLatencyNs updates remain safe (or likewise protected) consistent
with your chosen strategy.
In `@internal/worker/handlers_maintenance.go`:
- Around line 235-239: pattern.RunDecay is currently invoked with r.Context(),
which can be cancelled if the client disconnects leading to partial execution;
instead create a new timeout context (e.g., via
context.WithTimeout(context.Background(), <reasonableDuration>)) and pass that
to pattern.RunDecay, mirroring the approach used in handleRunMaintenance, and
ensure you call cancel() to release resources; update the call that currently
passes r.Context() to use this new ctx while keeping the same error handling for
the returned deprecated, err values.
In `@plugin/engram/hooks/session-start.js`:
- Around line 55-59: Удалите бессмысленную проверку result.results и упростите
инициализацию переменной observations: вместо ветвления, которое сначала
проверяет result.results, оставьте только проверку
Array.isArray(result.observations) и в противном случае присвойте пустой массив;
обновите место, где объявлена переменная observations (используется symbol
observations и ссылки на result.observations), и при необходимости добавьте
короткий комментарий о том, что API возвращает только observations для обратной
совместимости.
In `@TECHNICAL_DEBT.md`:
- Around line 13-16: The doc entry under "Config Reload via os.Exit(0)"
understates reliability risks: update the text for reloadConfig/os.Exit(0) to
enumerate real impacts (in-flight HTTP requests termination, lack of graceful
shutdown, potential data/transaction loss, user-facing downtime) and remove the
line that Docker restart policy mitigates these issues; instead, create an
action item/issue to replace os.Exit(0) with a graceful reload strategy by
refactoring reloadConfig to signal a coordinator (e.g., main/server lifecycle)
and implement proper shutdown handling (context cancellation, server.Shutdown,
connection draining, transaction flush) so reloads are done without abrupt
termination.
- Around line 3-6: Update TECHNICAL_DEBT.md to correct the MCP spec claim and
then ensure the server code follows the spec: remove the statement that
unsupported capabilities may return empty lists and instead note that servers
must omit unsupported entries from the initial "capabilities" declaration and
return JSON-RPC error -32601 ("Method not found") for calls to methods like
resources/list, prompts/list, and completion/complete when the corresponding
capability is not declared; verify and fix the code paths that build the
capabilities object (the initialization code that publishes "capabilities") and
the RPC handlers for resources/list, prompts/list, and completion/complete so
they either are declared and return possibly-empty lists when supported, or are
not declared and return -32601 when invoked.
In `@ui/src/utils/clipboard.ts`:
- Around line 13-24: The temporary textarea created for copying (the textarea
element and the success boolean returned from document.execCommand('copy')) must
be cleaned up in a finally block so it is always removed from the DOM if
appendChild() succeeds; refactor the function that creates textarea (the
textarea variable), call textarea = document.createElement(...) inside the try
but declare it before the try, set a default success = false, perform the
append/select/execCommand('copy') inside try, set success accordingly, and in
finally check if textarea exists and is attached (textarea.parentNode) and
remove it; return the success boolean after the try/catch/finally.
In `@ui/src/views/GraphView.vue`:
- Around line 74-83: relationTypeCounts is being populated from allRelations
before deduplication which causes double-counting when
fetchObservationRelations() returns the same relation twice; update the logic
that builds relationTypes and relationTypeCounts to operate on the same
deduplicated relation set used to compute edgeCount (e.g., dedupe allRelations
first by the relation's unique identifier or by a stable key and then compute
relationTypes.value and relationTypeCounts.value from that deduplicated array) —
adjust code paths around allRelations, relationTypeCounts, relationTypes, and
the place where deduplication is performed so both counts and rendered edges
come from the same deduplicated source.
- Around line 44-56: When loadGraph returns no observations (and similarly in
the catch block) also reset relationTypes, relationTypeCounts and clear the
current vis-network so stale filters/graph don't persist: inside loadGraph
(function loadGraph) where you set nodeCount and edgeCount to 0, also set
relationTypes.value = [] and relationTypeCounts.value = {} (or their
equivalents) and clear the network display (e.g. reset the vis/network data or
call the component's clear method on visNetwork/network). Apply the same resets
in the catch block so errors also remove previous graph state.
In `@ui/src/views/SystemView.vue`:
- Around line 230-231: The template directly calls
vectorMetrics.avg_latency_ms.toFixed(1) (and similar for p50_latency_ms,
p95_latency_ms, p99_latency_ms) which will throw if those properties are
undefined; update the bindings in SystemView.vue to safely access these values
using optional chaining and a fallback (e.g., (vectorMetrics?.avg_latency_ms ??
0).toFixed(1)) or coerce with Number(...) before calling toFixed so the UI never
calls toFixed on undefined and always displays a sensible default.
---
Outside diff comments:
In `@internal/pattern/detector_test.go`:
- Around line 334-339: The test is asserting outdated defaults: update the
expectations for config.MinMatchScore and config.MinFrequencyForPattern in
detector_test.go to the new defaults (expect MinMatchScore == 0.5 and
MinFrequencyForPattern == 5); locate the assertions referencing
config.MinMatchScore and config.MinFrequencyForPattern and change the expected
literal values (or, better, compare against the canonical defaults/constants
used by the implementation if available) so the test matches the new default
behavior.
In `@internal/worker/handlers_patterns.go`:
- Around line 63-72: The branch logic ignores offset for filtered queries
(patternType and project) causing inconsistent pagination; update the calls to
use offset the same way as the unfiltered path by passing offset to the store
methods: modify uses of store.GetPatternsByType and store.GetPatternsByProject
to accept/apply offset (either by passing limit+offset like the
GetActivePatterns call or by updating those methods to accept separate limit and
offset parameters) so pagination is consistent across patternType, project and
the unfiltered GetActivePatterns path.
- Around line 22-25: The Swagger godoc block that documents the patterns
endpoint (the comment containing `@Param` type, `@Param` project, `@Param` limit and
`@Success` 200 {array} models.Pattern) is missing the offset and sort parameters;
add two `@Param` lines: one for offset (query int false "Pagination offset
(default 0)") and one for sort (query string false "Sort order, e.g.
'created_at:desc' or 'name:asc'"), matching the existing style so OpenAPI
includes these query params for the patterns handler.
---
Nitpick comments:
In `@internal/pattern/decay.go`:
- Around line 47-52: Текущее чтение всех паттернов через
patternStore.GetActivePatterns с maxPatterns=10000 загружает всё в память;
замените это на пакетную обработку: измените вызов в decay.go (убрать константу
maxPatterns) и итерируйте по страницам, вызывая patternStore.GetActivePatterns
(или добавить новый метод типа ListActivePatterns(ctx, limit, offset/cursor)) в
цикле с ограничением batchSize, обрабатывайте/накладывайте decay для каждой
порции и прекращайте, когда возвращается пустой срез; сохраняйте существующую
логику обработки паттернов, но перенесите её внутрь цикла и обеспечьте
использование курсора/offset или следующего токена для устойчивой работы с
большими объёмами.
In `@internal/worker/handlers_maintenance.go`:
- Around line 191-233: The dry-run block duplicates the quality/decay
calculation in pattern.RunDecay by reusing pattern.QualityScore and
pattern.HybridDecay inline; extract that preview logic into a single function in
the pattern package (e.g., PreviewDecay(ctx context.Context, s store.Interface)
(totalActive int, wouldDeprecate int, err error)) that encapsulates fetching
active patterns, computing maxFreq, baseQuality, lastSeen parsing,
decayMultiplier and the <0.10 threshold, and returning counts and error; then
replace the inline dry-run loop in handlers_maintenance.go with a call to
pattern.PreviewDecay and propagate errors to writeJSON/http.Error as appropriate
so the handler no longer duplicates RunDecay logic.
In `@TECHNICAL_DEBT.md`:
- Around line 8-11: The migration-created table memory_blocks (migration 024) is
never populated; either remove it with a rollback migration or implement its
population by prioritizing the consolidation scheduler; to fix, choose one: (A)
add a new rollback migration that drops the memory_blocks table and remove any
references to it from code and schema docs (refer to migration 024 and any
schema registration code), or (B) raise work priority and implement the
consolidation scheduler changes that insert into memory_blocks during
consolidation flows (identify the consolidation scheduler module/class and the
insertion logic that should populate memory_blocks), then add tests that
validate memory_blocks is populated.
In `@ui/src/utils/api.ts`:
- Around line 501-508: Add a typed interface for the backend response and use it
with fetchWithRetry to improve type safety: define an interface RawVaultStatus
with optional fields key_configured, fingerprint, key_fingerprint, and
credential_count, then change the fetch call in fetchVaultStatus to
fetchWithRetry<RawVaultStatus>(...) so `raw` is strongly typed while keeping the
existing mapping to VaultStatus.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 45cfde6b-5e69-4e63-8386-ee80c1e8ae17
📒 Files selected for processing (34)
TECHNICAL_DEBT.mdcmd/mcp/main.gointernal/chunking/manager.gointernal/db/gorm/benchmark_test.gointernal/db/gorm/conflict_store_test.gointernal/db/gorm/integration_test.gointernal/db/gorm/observation_store.gointernal/db/gorm/observation_store_test.gointernal/db/gorm/relation_store.gointernal/graph/observation_graph.gointernal/maintenance/service.gointernal/pattern/decay.gointernal/pattern/detector.gointernal/pattern/detector_test.gointernal/pattern/quality.gointernal/vector/interface.gointernal/vector/pgvector/client.gointernal/vector/types.gointernal/worker/handlers_analytics.gointernal/worker/handlers_data.gointernal/worker/handlers_maintenance.gointernal/worker/handlers_patterns.gointernal/worker/handlers_vault.gointernal/worker/service.goplugin/engram/hooks/session-start.jsplugin/openclaw-engram/src/tools/engram-search.tsui/src/types/api.tsui/src/utils/api.tsui/src/utils/clipboard.tsui/src/views/GraphView.vueui/src/views/PatternsView.vueui/src/views/SystemView.vueui/src/views/TokensView.vueui/src/views/VaultView.vue
💤 Files with no reviewable changes (2)
- internal/graph/observation_graph.go
- plugin/openclaw-engram/src/tools/engram-search.ts
| // GetObservationsSinceEpoch retrieves observations created at or after the given epoch (ms). | ||
| // Includes project-scoped observations for the specified project AND global observations. | ||
| // Results are ordered by created_at_epoch DESC. | ||
| func (s *ObservationStore) GetObservationsSinceEpoch(ctx context.Context, project string, sinceEpochMs int64) ([]*models.Observation, error) { | ||
| var dbObservations []Observation | ||
| err := s.db.WithContext(ctx). | ||
| Where("created_at_epoch >= ?", sinceEpochMs). | ||
| Scopes(projectScopeFilter(project), importanceOrdering()). | ||
| Find(&dbObservations).Error | ||
|
|
||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| return toModelObservations(dbObservations), nil | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Проверяем, как используется GetObservationsSinceEpoch и есть ли ограничения на вызывающей стороне
rg -n 'GetObservationsSinceEpoch' --type goRepository: thebtf/engram
Length of output: 511
🏁 Script executed:
# Check the calling context at handlers_analytics.go:63
sed -n '50,80p' internal/worker/handlers_analytics.goRepository: thebtf/engram
Length of output: 1067
🏁 Script executed:
# Look for other similar methods in observation_store.go to see if they use Limit
rg -n 'func.*ObservationStore.*\(' internal/db/gorm/observation_store.go -A 3 | head -50Repository: thebtf/engram
Length of output: 1761
🏁 Script executed:
# Check if there are any tests for GetObservationsSinceEpoch
rg -n 'GetObservationsSinceEpoch' --type go -B 2 -A 5Repository: thebtf/engram
Length of output: 1841
🏁 Script executed:
# Check if other methods in observation_store use Limit
rg 'Limit\(' internal/db/gorm/observation_store.go -B 3 -A 1Repository: thebtf/engram
Length of output: 2337
🏁 Script executed:
# Get the full observation_store.go to understand the structure better
wc -l internal/db/gorm/observation_store.goRepository: thebtf/engram
Length of output: 99
🏁 Script executed:
# Check if there are any pagination-related methods or patterns
rg 'func.*ObservationStore.*\(' internal/db/gorm/observation_store.go | grep -i -E 'limit|page|offset'Repository: thebtf/engram
Length of output: 2411
Добавьте параметр limit для ограничения количества возвращаемых записей.
Метод GetObservationsSinceEpoch не применяет LIMIT в запросе к базе данных. Это нарушает паттерн, используемый всеми похожими методами в ObservationStore (все они принимают параметр limit). При запросе диапазона 365 дней для активного проекта метод может вернуть тысячи или миллионы записей, что приведёт к исчерпанию памяти. Проблема усугубляется тем, что метод вызывается из HTTP-обработчика аналитики, который подвергается влиянию пользовательского ввода.
Предлагаемое исправление
-func (s *ObservationStore) GetObservationsSinceEpoch(ctx context.Context, project string, sinceEpochMs int64) ([]*models.Observation, error) {
+func (s *ObservationStore) GetObservationsSinceEpoch(ctx context.Context, project string, sinceEpochMs int64, limit int) ([]*models.Observation, error) {
+ if limit <= 0 {
+ limit = 1000
+ }
var dbObservations []Observation
err := s.db.WithContext(ctx).
Where("created_at_epoch >= ?", sinceEpochMs).
Scopes(projectScopeFilter(project), importanceOrdering()).
+ Limit(limit).
Find(&dbObservations).Error🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/db/gorm/observation_store.go` around lines 576 - 591, Update
GetObservationsSinceEpoch on ObservationStore to accept a limit parameter and
apply it to the DB query (use .Limit(limit)) while preserving the existing
Where, Scopes (projectScopeFilter(project), importanceOrdering()) and ordering
behavior; change the function signature, add validation or a sane default if
needed, apply .Limit before Find(&dbObservations), and update all callers to
pass the new limit so toModelObservations(dbObservations) still returns the same
model slice.
| var lastSeen time.Time | ||
| if p.LastSeenAt != "" { | ||
| lastSeen, _ = time.Parse(time.RFC3339, p.LastSeenAt) | ||
| } | ||
| if lastSeen.IsZero() && p.CreatedAt != "" { | ||
| lastSeen, _ = time.Parse(time.RFC3339, p.CreatedAt) | ||
| } | ||
|
|
There was a problem hiding this comment.
Игнорирование ошибок парсинга времени может привести к некорректной депрекации паттернов.
Если LastSeenAt и CreatedAt не удаётся распарсить, lastSeen останется нулевым (time.Time{}). В этом случае HybridDecay вычислит возраст паттерна как десятки лет, что приведёт к немедленной депрекации паттерна независимо от его качества.
🛡️ Предлагаемое исправление с логированием и fallback
var lastSeen time.Time
if p.LastSeenAt != "" {
- lastSeen, _ = time.Parse(time.RFC3339, p.LastSeenAt)
+ var err error
+ lastSeen, err = time.Parse(time.RFC3339, p.LastSeenAt)
+ if err != nil {
+ log.Warn().Err(err).Int64("pattern_id", p.ID).Str("last_seen_at", p.LastSeenAt).Msg("Failed to parse LastSeenAt")
+ }
}
if lastSeen.IsZero() && p.CreatedAt != "" {
- lastSeen, _ = time.Parse(time.RFC3339, p.CreatedAt)
+ var err error
+ lastSeen, err = time.Parse(time.RFC3339, p.CreatedAt)
+ if err != nil {
+ log.Warn().Err(err).Int64("pattern_id", p.ID).Str("created_at", p.CreatedAt).Msg("Failed to parse CreatedAt")
+ }
+ }
+
+ // Skip patterns with unparseable timestamps to avoid incorrect deprecation
+ if lastSeen.IsZero() {
+ log.Warn().Int64("pattern_id", p.ID).Msg("Skipping pattern with unparseable timestamps")
+ continue
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/pattern/decay.go` around lines 70 - 77, The parsing of timestamps in
HybridDecay currently ignores errors (parsing p.LastSeenAt and p.CreatedAt)
which can leave lastSeen as time.Time{} and cause immediate deprecation; update
the parsing code in internal/pattern/decay.go to check the error returned by
time.Parse when parsing p.LastSeenAt and p.CreatedAt, log the parse error (use
the package logger or the appropriate logger in scope) and treat a parse failure
as "timestamp unavailable" rather than silently using the zero time—fallback to
the other field if available, and if both fail set lastSeen to time.Now() (or a
safe default) so patterns are not immediately deprecated; ensure you touch the
parsing logic around lastSeen, p.LastSeenAt, p.CreatedAt and HybridDecay to
implement these checks and logging.
| if dynamicQuality < 0.10 { | ||
| if err := patternStore.MarkPatternDeprecated(ctx, p.ID); err != nil { | ||
| log.Warn().Err(err).Int64("pattern_id", p.ID).Msg("Failed to deprecate pattern") | ||
| continue | ||
| } | ||
| deprecated++ | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What are industry best practices for quality score thresholds in pattern deprecation systems?
💡 Result:
Industry practice is not to use one universal “quality score” cutoff, but to design tiered, risk-based thresholds and a repeatable deprecation workflow.
1) Use multi-level thresholds (not a single line)
Define at least two or three operating bands (for example warning / error / fatal), where only the most severe band auto-enforces or blocks. This pattern is explicitly reflected in common data-quality rule systems that distinguish warning/error/fatal semantics for escalating action. [1]
2) Gate deprecation on sustained breach, not one bad run
Best practice is to require a score to be below threshold for N consecutive evaluation windows (and/or over a rolling window) before deprecating, to avoid churn from noise. Many platforms frame thresholds as alert triggers rather than immediate irreversible action. [2][3]
3) Add “tolerance” (error-budget) so small breakage doesn’t flip status
Instead of deducting/penalizing on any violation, apply a tolerance threshold so a rule/pattern is treated as passing until the violating fraction exceeds the tolerance (and suppress alerts until then). [4]
4) Ensure thresholds are calibrated to the metric and domain
For “match/similarity” style quality scores, vendors commonly treat thresholds as a tunable strictness control (looser vs tighter), often with best-practice defaults and bounded ranges (e.g., match scores on a 0–100 scale with configurable thresholds). [5][6]
5) Separate “quality for alerting” from “quality for deprecation”
A common control-plane pattern is:
- Warning threshold: notify / open ticket
- Error threshold: block new adoption (freeze)
- Deprecation threshold: begin retirement workflow (announce + migration window)
This aligns with products that support thresholds for alerting/ticketing when scores fall below configured values. [2][7]
6) Make deprecation a process with explicit states and communication
Treat “deprecated” as a lifecycle state in a maturity model (with criteria and a defined process), rather than just “score < X”. This is a common design-system pattern for components/patterns. [8]
Practical template (what many teams implement)
- Per-pattern thresholds (by criticality): higher-risk patterns require higher quality.
- Minimum sample size before evaluating (no deprecation on low volume).
- Hysteresis (harder to deprecate than to stay active; harder to “un-deprecate” than to warn).
- Human approval for auto-deprecation candidates; auto-actions limited to warning/freeze.
Sources:
[1] DQOps severity levels (warning/error/fatal). (dqops.com)
[2] Ataccama thresholds trigger alerts when quality is unacceptable. (docs.ataccama.com)
[3] Syniti threshold-driven status (e.g., warning/green). (skthelp.syniti.com)
[4] Collibra tolerance thresholds suppress scoring/alerts below tolerance. (productresources.collibra.com)
[5] Oracle match threshold: match score thresholding with defined ranges. (docs.oracle.com)
[6] SAP DQM: “best practice” default match threshold; tune looser/tighter. (help.sap.com)
[7] Informatica: tickets when data quality score goes below threshold. (docs.informatica.com)
[8] VA.gov design system maturity scale includes “Deprecated” state and a deprecation process. (design.va.gov)
Citations:
- 1: https://dqops.com/docs/dqo-concepts/configuring-data-quality-checks-and-rules/?utm_source=openai
- 2: https://docs.ataccama.com/ataccama-one-agentic/early-access/data-quality/data-quality-thresholds.html?utm_source=openai
- 3: https://skthelp.syniti.com/docs/data-quality-thresholds?utm_source=openai
- 4: https://productresources.collibra.com/docs/collibra/dqc/latest/Content/DataQuality/DQCoreComponents/Rules/ta_rule-tolerance.htm?utm_source=openai
- 5: https://docs.oracle.com/cd/E58886_01/books/PDF/DataQUCM.pdf?utm_source=openai
- 6: https://help.sap.com/doc/7bf7979bb47b49ef8b7c58db0897c1e8/4.0.10/en-US/dqmsap_hana_userguide_en.pdf?utm_source=openai
- 7: https://docs.informatica.com/release-information/what-s-new-in-idmc/current-version/what-s-new/data-governance-and-catalog/new-features-and-enhancements.html?utm_source=openai
- 8: https://design.va.gov/about/maturity-scale?utm_source=openai
🏁 Script executed:
# Search for the decay.go file and examine its context
find . -name "decay.go" -type f | head -5Repository: thebtf/engram
Length of output: 84
🏁 Script executed:
# Look for any configuration or constants related to deprecation thresholds
rg -i "0\.10|deprecat|dynamic.*quality" --type go -B 2 -A 2 | head -100Repository: thebtf/engram
Length of output: 6429
🏁 Script executed:
# Search for tests related to pattern deprecation
fd -e go -x rg -l "deprecat" {} \; | head -20Repository: thebtf/engram
Length of output: 576
🏁 Script executed:
# Look for documentation or comments about quality thresholds
rg -i "threshold|quality.*score|deprecation.*policy" --type go -B 1 -A 1 | head -80Repository: thebtf/engram
Length of output: 4994
🏁 Script executed:
# Read the full decay.go file to understand the deprecation logic
cat -n ./internal/pattern/decay.goRepository: thebtf/engram
Length of output: 3295
🏁 Script executed:
# Search for any configuration or constants related to 0.10 or quality thresholds
rg "0\.10|QUALITY|THRESHOLD|DEPRECAT" --type go -B 2 -A 2Repository: thebtf/engram
Length of output: 7031
🏁 Script executed:
# Look for any comments or documentation about threshold rationale
rg -i "quality.*threshold|deprecat.*quality|0\.1" --type go --type md | head -30Repository: thebtf/engram
Length of output: 2645
Порог 0.10 — рекомендуется сделать конфигурируемым и добавить защиты от излишней агрессивности.
Текущая реализация имеет несколько проблем:
-
Жёстко закодированный порог (без конфигурации): в отличие от других порогов системы (
ENGRAM_DEDUP_SIMILARITY_THRESHOLD,ENGRAM_CLUSTERING_THRESHOLD), параметр 0.10 не параметризуется и не может быть подстроен без изменения кода. -
Одноразовая оценка без "охлаждения":
RunDecay()выполняет одну проверку и сразу помечает паттерн как deprecated. Нет требования к устойчивому нарушению (например, низкое качество на протяжении N последовательных периодов оценки), что может привести к депрекации качественных паттернов, просто редко используемых. -
Отсутствие многоуровневого подхода: нет отличия между уровнями серьёзности (предупреждение → ошибка → deprecation). Паттерн либо активен, либо сразу помечен как deprecated.
-
Нет механизма допуска (tolerance): нет буфера для небольших колебаний качества.
Рекомендуется: сделать порог конфигурируемым (среди переменных окружения), требовать устойчивое нарушение порога на нескольких итерациях, рассмотреть введение промежуточного состояния (warning) перед финальной deprecation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/pattern/decay.go` around lines 81 - 88, The hardcoded 0.10 threshold
and immediate deprecation are too aggressive: make the threshold configurable
(e.g., env var ENGRAM_DECAY_THRESHOLD used by RunDecay()), add a tolerance and
consecutive-failure requirement (track per-pattern consecutiveLowQuality count
in the pattern record or an in-memory map and require N consecutive runs before
calling patternStore.MarkPatternDeprecated), and introduce an intermediate
"warning" state (e.g., patternStore.MarkPatternWarning(ctx, p.ID) or a status
field) to be set on the first breach so only repeated breaches lead to
deprecation; update logic around dynamicQuality, p.ID, deprecated and
patternStore.MarkPatternDeprecated accordingly.
| start := time.Now() | ||
| defer func() { | ||
| dur := time.Since(start).Nanoseconds() | ||
| c.queryCount.Add(1) | ||
| c.queryLatencyNs.Add(dur) | ||
| idx := c.latencyIdx.Add(1) - 1 | ||
| c.latencyBuf[idx%latencyBufSize] = dur | ||
| }() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Проверяем, используется ли mutex где-либо в файле
rg -n 'mu\.(R)?Lock|mu\.(R)?Unlock' internal/vector/pgvector/client.goRepository: thebtf/engram
Length of output: 39
🏁 Script executed:
# Get file size and check structure
wc -l internal/vector/pgvector/client.goRepository: thebtf/engram
Length of output: 95
🏁 Script executed:
# Read the constant and struct definition (lines 40-60)
sed -n '40,60p' internal/vector/pgvector/client.goRepository: thebtf/engram
Length of output: 681
🏁 Script executed:
# Read the query instrumentation code (lines 140-155)
sed -n '140,155p' internal/vector/pgvector/client.goRepository: thebtf/engram
Length of output: 494
🏁 Script executed:
# Read the GetMetrics method implementation (lines 370-425)
sed -n '370,425p' internal/vector/pgvector/client.goRepository: thebtf/engram
Length of output: 1720
🏁 Script executed:
# Check if there are any synchronization primitives used in the whole file
rg -n 'atomic\.|sync\.|mutex|lock' internal/vector/pgvector/client.go | head -30Repository: thebtf/engram
Length of output: 184
🏁 Script executed:
# Check the Go version requirement in go.mod
head -5 go.modRepository: thebtf/engram
Length of output: 110
Гонка данных при записи в кольцевой буфер и неиспользуемое поле синхронизации.
-
Гонка данных в
latencyBuf: Запись вc.latencyBuf[idx%latencyBufSize](строка 149) происходит без синхронизации. При параллельных запросах несколько горутин могут одновременно писать в один индекс, аGetMetrics()(строки 393-399) читает из буфера без синхронизации — это классическая гонка read-write. Хотя это может быть намеренным дизайном для приблизительных метрик, требуется явное документирование и либо добавление синхронизации, либо явное заявление о допустимости потери данных. -
Неиспользуемое поле
mu sync.RWMutex: Поле объявлено в структуре (строка 51), но никогда не используется для блокировки. Это указывает на неполную реализацию стратегии синхронизации. Либо добавьте синхронизацию черезmu, либо удалите неиспользуемое поле.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/vector/pgvector/client.go` around lines 143 - 150, The defer that
writes dur into c.latencyBuf using idx from c.latencyIdx and the GetMetrics()
reader access the same buffer without synchronization (symbols: c.latencyBuf,
c.latencyIdx, GetMetrics), causing a read-write race; also the struct field mu
sync.RWMutex is declared but never used. Fix by either (A) making the behavior
explicitly documented as best-effort and removing mu, or (B) using mu (or a
dedicated mutex) to protect writes and reads: in the deferred function lock for
write (mu.Lock/Unlock) around the c.latencyBuf[...] assignment and increment of
c.latencyIdx, and in GetMetrics() use mu.RLock/RUnlock while reading the buffer
and computing metrics; ensure queryCount/queryLatencyNs updates remain safe (or
likewise protected) consistent with your chosen strategy.
| samples := make([]int64, n) | ||
| // Read the most recent n entries from the ring buffer. | ||
| // latencyIdx is the next-write position; the most recent n entries end at latencyIdx-1. | ||
| startIdx := c.latencyIdx.Load() - n | ||
| for i := int64(0); i < n; i++ { | ||
| samples[i] = c.latencyBuf[(startIdx+i)%latencyBufSize] | ||
| } | ||
| sort.Slice(samples, func(i, j int) bool { return samples[i] < samples[j] }) | ||
| p50 = float64(samples[n*50/100]) / 1e6 | ||
| p95 = float64(samples[n*95/100]) / 1e6 | ||
| idx99 := n * 99 / 100 | ||
| if idx99 >= n { | ||
| idx99 = n - 1 | ||
| } | ||
| p99 = float64(samples[idx99]) / 1e6 | ||
| } |
There was a problem hiding this comment.
Чтение из кольцевого буфера без синхронизации.
GetMetrics читает из latencyBuf одновременно с записями в Query. Это data race по стандартам Go memory model. Хотя для метрик допустима некоторая погрешность, go race detector будет сообщать об ошибке.
Варианты исправления:
- Использовать
atomic.Int64для каждого элемента буфера - Защитить буфер мьютексом
- Документировать, что метрики приблизительные
Минимальное исправление с atomic
- latencyBuf [latencyBufSize]int64 // ring buffer for percentile calculation
+ latencyBuf [latencyBufSize]atomic.Int64 // ring buffer for percentile calculation🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/vector/pgvector/client.go` around lines 393 - 408, GetMetrics
currently reads from latencyBuf concurrently with writes in Query causing a Go
data race; convert latencyBuf to a slice of atomic.Int64 (or protect with a
mutex) and update all writes in Query to use Store on
latencyBuf[(i)%latencyBufSize] and all reads in GetMetrics to use Load when
building samples; keep existing logic for computing startIdx from latencyIdx and
modulo by latencyBufSize, and ensure allocations (samples := make([]int64, n))
and sorts remain unchanged so the metrics calculation (p50/p95/p99) uses the
loaded atomic values.
| ## 2026-03-19: Config Reload via os.Exit(0) | ||
| What: reloadConfig calls os.Exit(0) instead of hot-reload | ||
| Why deferred: Hot-reload requires significant refactoring of service initialization | ||
| Impact: Docker restart policy handles the restart automatically |
There was a problem hiding this comment.
Недооценка влияния: os.Exit(0) создаёт риски надёжности.
Описание влияния существенно занижает серьёзность проблемы. Использование os.Exit(0) для перезагрузки конфигурации вызывает следующие проблемы:
- Потеря обрабатываемых запросов: HTTP-запросы в процессе обработки будут прерваны
- Отсутствие graceful shutdown: соединения не закрываются корректно
- Риск потери данных: незавершённые транзакции могут быть потеряны
- Прерывание сервиса: пользователи столкнутся с простоем во время перезапуска
Тот факт, что Docker автоматически перезапускает сервис, не устраняет эти проблемы — он лишь автоматизирует восстановление после сбоя. Это проблема надёжности и качества обслуживания пользователей, которая должна быть приоритизирована выше, чем указано в текущей документации.
Хотите, чтобы я помог определить минимальный объём рефакторинга для реализации graceful reload, или открыть issue для отслеживания этой задачи с более высоким приоритетом?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@TECHNICAL_DEBT.md` around lines 13 - 16, The doc entry under "Config Reload
via os.Exit(0)" understates reliability risks: update the text for
reloadConfig/os.Exit(0) to enumerate real impacts (in-flight HTTP requests
termination, lack of graceful shutdown, potential data/transaction loss,
user-facing downtime) and remove the line that Docker restart policy mitigates
these issues; instead, create an action item/issue to replace os.Exit(0) with a
graceful reload strategy by refactoring reloadConfig to signal a coordinator
(e.g., main/server lifecycle) and implement proper shutdown handling (context
cancellation, server.Shutdown, connection draining, transaction flush) so
reloads are done without abrupt termination.
| try { | ||
| const textarea = document.createElement('textarea') | ||
| textarea.value = text | ||
| textarea.style.position = 'fixed' | ||
| textarea.style.left = '-9999px' | ||
| document.body.appendChild(textarea) | ||
| textarea.select() | ||
| const success = document.execCommand('copy') | ||
| document.body.removeChild(textarea) | ||
| return success | ||
| } catch { | ||
| return false |
There was a problem hiding this comment.
Удаляйте временный <textarea> в finally.
Если исключение произойдёт после appendChild(), скрытый элемент останется в DOM. На повторных неудачных копированиях это будет накапливать мусор.
💡 Возможная правка
try {
const textarea = document.createElement('textarea')
textarea.value = text
textarea.style.position = 'fixed'
textarea.style.left = '-9999px'
document.body.appendChild(textarea)
- textarea.select()
- const success = document.execCommand('copy')
- document.body.removeChild(textarea)
- return success
+ try {
+ textarea.select()
+ return document.execCommand('copy')
+ } finally {
+ document.body.removeChild(textarea)
+ }
} catch {
return false
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ui/src/utils/clipboard.ts` around lines 13 - 24, The temporary textarea
created for copying (the textarea element and the success boolean returned from
document.execCommand('copy')) must be cleaned up in a finally block so it is
always removed from the DOM if appendChild() succeeds; refactor the function
that creates textarea (the textarea variable), call textarea =
document.createElement(...) inside the try but declare it before the try, set a
default success = false, perform the append/select/execCommand('copy') inside
try, set success accordingly, and in finally check if textarea exists and is
attached (textarea.parentNode) and remove it; return the success boolean after
the try/catch/finally.
| async function loadGraph() { | ||
| loading.value = true | ||
| error.value = null | ||
| hoveredNode.value = null | ||
|
|
||
| try { | ||
| const graph: RelationGraph = await fetchObservationGraph(observationId, 2) | ||
| // Step 1: Fetch top observations | ||
| const observations = await fetchTopObservations(selectedProject.value || undefined, 50) | ||
| if (observations.length === 0) { | ||
| nodeCount.value = 0 | ||
| edgeCount.value = 0 | ||
| loading.value = false | ||
| return |
There was a problem hiding this comment.
Сбрасывайте старый граф при пустом результате и ошибке.
При переключении с проекта с данными на пустой здесь обнуляются только nodeCount и edgeCount. relationTypes, relationTypeCounts и текущий vis-network остаются от предыдущего успешного запроса, поэтому UI продолжает показывать устаревшие фильтры и граф для нового проекта. Та же проблема остаётся и в catch.
💡 Возможная правка
async function loadGraph() {
loading.value = true
error.value = null
hoveredNode.value = null
try {
// Step 1: Fetch top observations
const observations = await fetchTopObservations(selectedProject.value || undefined, 50)
if (observations.length === 0) {
nodeCount.value = 0
edgeCount.value = 0
- loading.value = false
+ relationTypes.value = []
+ relationTypeCounts.value = {}
+ enabledRelationTypes.value = new Set()
+ if (network) {
+ network.destroy()
+ network = null
+ }
return
}
@@
} catch (err) {
+ nodeCount.value = 0
+ edgeCount.value = 0
+ relationTypes.value = []
+ relationTypeCounts.value = {}
+ if (network) {
+ network.destroy()
+ network = null
+ }
error.value = err instanceof Error ? err.message : 'Failed to load graph'
} finally {
loading.value = false
}
}Also applies to: 144-147
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ui/src/views/GraphView.vue` around lines 44 - 56, When loadGraph returns no
observations (and similarly in the catch block) also reset relationTypes,
relationTypeCounts and clear the current vis-network so stale filters/graph
don't persist: inside loadGraph (function loadGraph) where you set nodeCount and
edgeCount to 0, also set relationTypes.value = [] and relationTypeCounts.value =
{} (or their equivalents) and clear the network display (e.g. reset the
vis/network data or call the component's clear method on visNetwork/network).
Apply the same resets in the catch block so errors also remove previous graph
state.
| // Collect all relation types for filter | ||
| const typeSet = new Set<string>() | ||
| const typeCounts: Record<string, number> = {} | ||
| for (const rel of allRelations) { | ||
| const rt = rel.relation.relation_type | ||
| typeSet.add(rt) | ||
| typeCounts[rt] = (typeCounts[rt] || 0) + 1 | ||
| } | ||
| relationTypes.value = Array.from(typeSet).sort() | ||
| relationTypeCounts.value = typeCounts |
There was a problem hiding this comment.
Счётчики типов сейчас считают дубликаты.
Если обе вершины связи попали в топ-50, fetchObservationRelations() вернёт одну и ту же relation из обоих запросов. Здесь relationTypeCounts строится до дедупликации, а edgeCount — уже после неё, поэтому карточки по типам будут системно завышены относительно реально отрисованных рёбер.
Also applies to: 126-137
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ui/src/views/GraphView.vue` around lines 74 - 83, relationTypeCounts is being
populated from allRelations before deduplication which causes double-counting
when fetchObservationRelations() returns the same relation twice; update the
logic that builds relationTypes and relationTypeCounts to operate on the same
deduplicated relation set used to compute edgeCount (e.g., dedupe allRelations
first by the relation's unique identifier or by a stable key and then compute
relationTypes.value and relationTypeCounts.value from that deduplicated array) —
adjust code paths around allRelations, relationTypeCounts, relationTypes, and
the place where deduplication is performed so both counts and rendered edges
come from the same deduplicated source.
| <span class="text-slate-600 text-xs block">Avg Latency</span> | ||
| <span class="text-slate-300 font-mono">{{ vectorMetrics.avg_latency_ms.toFixed(1) }}ms</span> |
There was a problem hiding this comment.
Потенциальная ошибка при вызове .toFixed() на undefined.
Если API вернёт неполный объект vectorMetrics (например, при ошибке или старой версии бэкенда), вызов vectorMetrics.avg_latency_ms.toFixed(1) выбросит TypeError. Аналогично для p50_latency_ms, p95_latency_ms, p99_latency_ms.
Рассмотрите добавление fallback-значений:
Предлагаемое исправление
- <span class="text-slate-300 font-mono">{{ vectorMetrics.avg_latency_ms.toFixed(1) }}ms</span>
+ <span class="text-slate-300 font-mono">{{ (vectorMetrics.avg_latency_ms ?? 0).toFixed(1) }}ms</span>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| <span class="text-slate-600 text-xs block">Avg Latency</span> | |
| <span class="text-slate-300 font-mono">{{ vectorMetrics.avg_latency_ms.toFixed(1) }}ms</span> | |
| <span class="text-slate-600 text-xs block">Avg Latency</span> | |
| <span class="text-slate-300 font-mono">{{ (vectorMetrics.avg_latency_ms ?? 0).toFixed(1) }}ms</span> |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ui/src/views/SystemView.vue` around lines 230 - 231, The template directly
calls vectorMetrics.avg_latency_ms.toFixed(1) (and similar for p50_latency_ms,
p95_latency_ms, p99_latency_ms) which will throw if those properties are
undefined; update the bindings in SystemView.vue to safely access these values
using optional chaining and a fallback (e.g., (vectorMetrics?.avg_latency_ms ??
0).toFixed(1)) or coerce with Number(...) before calling toFixed so the UI never
calls toFixed on undefined and always displays a sensible default.
Closes #25: the "Add Comment (as operator)" field was a single-line <input> that couldn't hold multi-line content like stack traces or code blocks. Now uses a <textarea> with 4 default rows, resizable up to 400px. Submit shortcut changed from Enter to Ctrl+Enter (Cmd+Enter on macOS) so Enter inserts a newline.
* feat: server-side feedback import endpoint + CLI HTTP client - POST /api/import/feedback: accepts feedback_*.md content, processes through server-side LLM (FeedbackImportSystemPrompt), stores as type=guidance observation with semantic dedup (0.90 threshold) - CLI engram-import: sends files to server via HTTP instead of calling LLM directly (LLM only accessible from server/LM Studio) - Removes direct LLM dependency from CLI binary * fix(feedback): finalize session outcome identity handling * fix(context): enforce by-file project scoping * fix(learning): remove hit-rate markdown reparsing * fix(ui): replace single-line comment input with multi-line textarea Closes #25: the "Add Comment (as operator)" field was a single-line <input> that couldn't hold multi-line content like stack traces or code blocks. Now uses a <textarea> with 4 default rows, resizable up to 400px. Submit shortcut changed from Enter to Ctrl+Enter (Cmd+Enter on macOS) so Enter inserts a newline. --------- Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
Summary
Comprehensive quality hardening: eliminates all stubs, fixes dashboard UX, implements pattern quality system.
Phase 1: Stub Elimination (T001-T010)
Phase 2: Dashboard UX (T011-T016)
Phase 3: Dashboard Enhancements (T017-T018)
Phase 4: Pattern Quality (T019-T023)
Test plan
go build ./cmd/worker/compilesSummary by CodeRabbit
Заметки о выпуске
Новые возможности
Улучшения