Skip to content

Conversation

@renecannao
Copy link

Summary

Implement firewall functionality and query digest statistics for MCP (Model Context Protocol) queries in ProxySQL, mirroring the existing MySQL query rules and statistics infrastructure.

Key Design

Action is inferred from rule properties (like MySQL/PostgreSQL query rules):

  • error_msg != NULL → block
  • replace_pattern != NULL → rewrite
  • timeout_ms > 0 → timeout
  • OK_msg != NULL → return OK message
  • otherwise → allow

Actions are NOT mutually exclusive - a single rule can perform multiple actions simultaneously.

Changes

  • Add mcp_query_rules table for firewall rules (17 columns)
  • Add runtime_mcp_query_rules table (same 17 columns, shows in-memory state)
  • Add stats_mcp_query_rules table with rule_id and hits
  • Add stats_mcp_query_digest table for query statistics (10 columns)
  • Implement rule evaluation with flagIN/flagOUT chaining
  • Implement query digest computation using SpookyHash
  • Implement hit counters maintained in-memory only
  • Add LOAD/SAVE commands for MCP query rules

Tables Added

  • mcp_query_rules - Config table for defining rules
  • runtime_mcp_query_rules - Runtime view of active rules
  • stats_mcp_query_rules - Hit counters per rule
  • stats_mcp_query_digest - Query digest statistics
  • stats_mcp_query_digest_reset - Reset-on-read stats

Commands Added

  • LOAD MCP QUERY RULES FROM DISK
  • LOAD MCP QUERY RULES TO MEMORY
  • LOAD MCP QUERY RULES TO RUNTIME
  • SAVE MCP QUERY RULES TO DISK
  • SAVE MCP QUERY RULES TO MEMORY
  • SAVE MCP QUERY RULES FROM RUNTIME

Documentation

This PR also includes comprehensive documentation for:

  • Static harvester functions
  • Query tool handler helpers
  • Schema isolation in catalog
  • MCP query rules and digest functions

Test plan

  • Create a test rule
  • Verify LOAD/SAVE commands work
  • Verify runtime_mcp_query_rules shows hits
  • Verify stats_mcp_query_rules tracks hits
  • Verify stats_mcp_query_digest tracks queries
  • Test rule evaluation (block, rewrite, timeout)
  • Test flagIN/flagOUT chaining

… schema

- Add ADMIN_SQLITE_TABLE_RUNTIME_MCP_QUERY_RULES schema (17 columns, same as mcp_query_rules)
- Fix STATS_SQLITE_TABLE_MCP_QUERY_RULES to only have rule_id and hits columns
- Add runtime_mcp_query_rules detection and refresh in ProxySQL_Admin
- Implement save_mcp_query_rules_from_runtime(bool _runtime) for both config and runtime tables
- Update get_mcp_query_rules() to return 17 columns (no hits)
- get_stats_mcp_query_rules() returns 2 columns (rule_id, hits)

Mirrors the MySQL query rules pattern:
- mcp_query_rules: config table (17 cols)
- runtime_mcp_query_rules: runtime state (17 cols)
- stats_mcp_query_rules: hit counters (2 cols)
- Add separate MCP QUERY RULES command block in Admin_Handler
- Fix string length comparison (21 chars for "SAVE/LOAD MCP QUERY RULES ")
- Add handlers for:
  - LOAD MCP QUERY RULES TO RUNTIME
  - SAVE MCP QUERY RULES TO DISK
  - SAVE MCP QUERY RULES TO MEMORY / FROM RUNTIME
- Register mcp_query_rules in disk database (tables_defs_config)

Previously MCP commands were incorrectly nested inside MYSQL/PGSQL
block and could not be reached. Now they have their own conditional
block.
- Add LOAD MCP QUERY RULES FROM DISK command
- Add LOAD MCP QUERY RULES TO MEMORY command
- Both commands copy rules from disk.mcp_query_rules to main.mcp_query_rules

This completes the full set of MCP query rules LOAD/SAVE commands,
matching the MySQL query rules pattern.
Change function signature from stats___mcp_query_rules(bool reset) to
stats___mcp_query_rules() to match MySQL query rules pattern.

The reset parameter was never used in the function body and MySQL's
stats___mysql_query_rules() has no reset parameter.
Previously re_modifiers was hardcoded to 1 (CASELESS), ignoring the
value stored in the database. Now properly reads from row->fields[7].
Add detailed function-level documentation to all MCP query rules,
query digest, static harvester, and catalog components.

Static_Harvester.cpp:
- Document all 18+ harvest functions (schemas, objects, columns, indexes, FKs, views)
- Document lifecycle methods (init, close, connect, disconnect)
- Document helper methods (is_time_type, is_id_like_name)
- Document run management (start_run, finish_run, run_full_harvest)
- Document statistics methods (get_harvest_stats)

Query_Tool_Handler.cpp:
- Document JSON helper functions (json_string, json_int, json_double)
- Document digest tracking section with flow explanation

MySQL_Catalog.cpp:
- Document schema isolation architecture
- Document CRUD operations (upsert, get, search, list, remove, merge)

Discovery_Schema.cpp:
- Document MCP query rules evaluation (evaluate_mcp_query_rules)
- Document digest functions (compute_mcp_digest, fingerprint_mcp_args)
- Document update/get functions for rules and digests

ProxySQL_Admin_Stats.cpp:
- Document stats collection functions

ProxySQL_Admin.cpp:
- Document load/save functions for query rules

Admin_Handler.cpp:
- Document MCP query rules command handlers

include/ProxySQL_Admin_Tables_Definitions.h:
- Add comments explaining table purposes
@coderabbitai
Copy link

coderabbitai bot commented Jan 19, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds a new MCP subsystem: in-memory rule storage and evaluation, digest/statistics tracking, admin/runtime table definitions and commands, catalog lifecycle and schema-aware catalog ops, plus a Static_Harvester for MySQL metadata harvesting.

Changes

Cohort / File(s) Summary
MCP Data Structures & Declarations
include/Discovery_Schema.h, include/ProxySQL_Admin_Tables_Definitions.h, include/proxysql_admin.h
Adds MCP_Query_Rule, MCP_Query_Digest_Stats, MCP_Query_Processor_Output; new private state (rule vector, rwlocks, digest umap, version); new public MCP APIs (load/evaluate/get/update/get_stats/compute/fingerprint); adds 5 SQLite table macros for MCP rules, runtime rules, and digest/rules stats; new includes.
MCP Core Implementation
lib/Discovery_Schema.cpp, lib/Query_Tool_Handler.cpp
Implements rule lifecycle (load/compile rules), rule evaluation (multi-action: rewrite, block, timeout, OK, flags), digest computation (fingerprinting JSON args + SpookyHash), in-memory digest updates, and integrates rule evaluation & digest updates into query execution path.
Admin Integration & CLI Handlers
lib/Admin_Bootstrap.cpp, lib/Admin_Handler.cpp, lib/ProxySQL_Admin.cpp, lib/ProxySQL_Admin_Stats.cpp
Registers MCP tables in admin/config/stats, adds CLI/admin handlers for LOAD/SAVE/TO/FROM RUNTIME, implements runtime load/save functions, implements stats export functions for digest and rule hit counters, and wires MCP into GenericRefreshStatistics.
Catalog Lifecycle & Schema Ops
lib/MySQL_Catalog.cpp
Refactors catalog to explicit lifecycle (init/close), adds schema-scoped CRUD/search/list/upsert/merge/remove operations, creates schema indexes and FTS setup.
Metadata Harvester
lib/Static_Harvester.cpp
Implements Static_Harvester: MySQL connection management, multi-stage harvest pipeline (schemas, objects, columns, indexes, FKs, views), quick profiles, FTS rebuild, run-scoped tracking and stats.

Sequence Diagram(s)

sequenceDiagram
    participant QTH as Query_Tool_Handler
    participant DS as Discovery_Schema
    participant DB as AdminDB
    participant QP as MCP_Processor

    QTH->>DS: evaluate_mcp_query_rules(tool_name, schemaname, arguments, original_query)
    activate DS
    DS->>QP: iterate active rules (regex match / filters)
    activate QP
    QP->>QP: evaluate match_pattern, username, schemaname, flags
    QP-->>DS: MCP_Query_Processor_Output (actions: rewrite/error/OK/timeout, flags)
    deactivate QP
    DS-->>QTH: MCP_Query_Processor_Output*
    deactivate DS

    alt OK_msg set
        QTH->>QTH: return success with OK_msg (early)
    else error_msg set
        QTH->>QTH: return error_msg (early)
    else new_query provided
        QTH->>QTH: execute rewritten query
        QTH->>DS: compute_mcp_digest(tool_name, arguments)
        DS-->>QTH: digest, digest_text
        QTH->>DS: update_mcp_query_digest(tool_name, run_id, digest, digest_text, duration, ts)
        DS->>DB: (optional) persist digest stats / later export via stats___mcp_query_digest
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

Poem

🐰 In burrows of bytes I hop and sing,

Rules and digests now take wing,
I stitch the schema, chase a hash,
Rewrite a query in a flash,
Hooray — the Mission Control bells ring!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: Add MCP query rules and digest statistics [WIP]' accurately describes the primary changes, which introduce MCP firewall functionality and query digest statistics tables and infrastructure.
Description check ✅ Passed The description comprehensively relates to the changeset, explaining the MCP query rules implementation, table structures, design decisions, and test plan items that align with the code changes.
Docstring Coverage ✅ Passed Docstring coverage is 82.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello @renecannao, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly extends ProxySQL's capabilities by integrating advanced management and monitoring features for Model Context Protocol (MCP) queries. It introduces a flexible rule-based system to control MCP query behavior, allowing for dynamic modifications or restrictions. Concurrently, it provides deep insights into MCP query performance and usage patterns through detailed digest statistics, enhancing observability and operational control for MCP workloads.

Highlights

  • MCP Query Rules Implementation: Introduced a comprehensive system for MCP (Model Context Protocol) query rules, enabling firewall-like functionality such as blocking, rewriting, timing out, or returning immediate OK messages for MCP queries. This mirrors existing MySQL query rule capabilities.
  • Query Digest Statistics for MCP: Added functionality to collect and track query digest statistics for MCP queries. This includes metrics like execution count, total/min/max execution times, and first/last seen timestamps, using SpookyHash for query fingerprinting.
  • New Configuration and Statistics Tables: Created several new tables: 'mcp_query_rules' for persistent rule configuration, 'runtime_mcp_query_rules' for in-memory active rules, 'stats_mcp_query_rules' for rule hit counters, and 'stats_mcp_query_digest' along with 'stats_mcp_query_digest_reset' for query performance statistics.
  • Enhanced Management Commands: Implemented new 'LOAD' and 'SAVE' commands for MCP query rules, allowing administrators to manage rules between disk, memory, and runtime, similar to other ProxySQL configuration elements.
  • Extensive Documentation and Code Comments: Significantly enhanced code documentation with detailed Doxygen-style comments for static harvester functions, query tool handler helpers, schema isolation in the catalog, and all new MCP query rules and digest functions, improving code clarity and maintainability.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant new functionality by adding MCP query rules and digest statistics, mirroring existing infrastructure for MySQL/PostgreSQL. The overall design is consistent with the project's patterns. However, there are several areas for improvement. The new code heavily relies on manual memory management with raw pointers, which is error-prone and should be updated to use modern C++ idioms like smart pointers and std::string for better safety and maintainability. I've also identified some duplicated code, an incomplete feature implementation in the statistics handling, and a few inconsistencies between comments and code that should be addressed. My detailed comments highlight these points to help improve the robustness of this new feature.

Comment on lines +831 to +836
MCP_Query_Processor_Output* evaluate_mcp_query_rules(
const std::string& tool_name,
const std::string& schemaname,
const nlohmann::json& arguments,
const std::string& original_query
);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The function evaluate_mcp_query_rules returns a raw pointer MCP_Query_Processor_Output*, and the documentation states that the caller is responsible for destroying it. This manual ownership transfer is error-prone and can easily lead to memory leaks if the caller forgets to delete the object.

To make the code safer and more idiomatic, you should use a smart pointer to manage the lifetime of the returned object automatically.

	std::unique_ptr<MCP_Query_Processor_Output> evaluate_mcp_query_rules(
		const std::string& tool_name,
		const std::string& schemaname,
		const nlohmann::json& arguments,
		const std::string& original_query
	);

Comment on lines 2597 to 2600
// For now, we'll leave the table empty since MCP digest stats are stored in memory
// in the Discovery_Schema and would need to be accessed differently
// TODO: Implement proper access to Discovery_Schema digest statistics

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This function is intended to populate the stats_mcp_query_digest table, but the implementation is incomplete. It currently only deletes existing data and contains a TODO comment, leaving the table empty. To complete this feature, you should fetch the digest statistics from the Discovery_Schema catalog (using get_mcp_query_digest) and insert the data into the stats table.

Comment on lines +23 to +49
struct MCP_Query_Rule {
int rule_id;
bool active;
char *username;
char *schemaname;
char *tool_name;
char *match_pattern;
bool negate_match_pattern;
int re_modifiers; // bitmask: 1=CASELESS
int flagIN;
int flagOUT;
char *replace_pattern;
int timeout_ms;
char *error_msg;
char *ok_msg;
bool log;
bool apply;
char *comment;
uint64_t hits; // in-memory only, not persisted to table
void* regex_engine; // compiled regex (RE2)

MCP_Query_Rule() : rule_id(0), active(false), username(NULL), schemaname(NULL),
tool_name(NULL), match_pattern(NULL), negate_match_pattern(false),
re_modifiers(1), flagIN(0), flagOUT(0), replace_pattern(NULL),
timeout_ms(0), error_msg(NULL), ok_msg(NULL), log(false), apply(true),
comment(NULL), hits(0), regex_engine(NULL) {}
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The MCP_Query_Rule struct uses C-style char* for string members and a void* for the regex engine. This requires manual memory management (e.g., strdup/free, new/delete) which is complex and error-prone in C++. Using modern C++ features would make the code safer and more maintainable.

I recommend the following changes:

  • Replace char* members with std::string to automate memory management.
  • Replace void* regex_engine with std::unique_ptr<re2::RE2> to manage the lifetime of the regex object automatically and provide type safety.

These changes would eliminate the need for manual memory management in the Discovery_Schema destructor and load_mcp_query_rules function, reducing the risk of memory leaks or corruption.

Comment on lines +351 to +372
db->execute(
"CREATE TABLE IF NOT EXISTS mcp_query_rules ("
" rule_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL ,"
" active INT CHECK (active IN (0,1)) NOT NULL DEFAULT 0 ,"
" tool_name VARCHAR ,"
" run_id INT ,"
" match_pattern VARCHAR ,"
" negate_match_pattern INT CHECK (negate_match_pattern IN (0,1)) NOT NULL DEFAULT 0 ,"
" re_modifiers VARCHAR DEFAULT 'CASELESS' ,"
" flagIN INT NOT NULL DEFAULT 0 ,"
" flagOUT INT CHECK (flagOUT >= 0) ,"
" action VARCHAR CHECK (action IN ('allow','block','rewrite','timeout')) NOT NULL DEFAULT 'allow' ,"
" replace_pattern VARCHAR ,"
" timeout_ms INT CHECK (timeout_ms >= 0) ,"
" error_msg VARCHAR ,"
" OK_msg VARCHAR ,"
" log INT CHECK (log IN (0,1)) ,"
" apply INT CHECK (apply IN (0,1)) NOT NULL DEFAULT 1 ,"
" comment VARCHAR ,"
" hits INTEGER NOT NULL DEFAULT 0"
");"
);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The CREATE TABLE statement for mcp_query_rules within this function appears to be dead code. The schema defined here (with run_id and action columns) is inconsistent with the mcp_query_rules table defined in ProxySQL_Admin_Tables_Definitions.h and used by the admin interface. The application logic loads rules from the admin database into an in-memory structure and does not appear to use this table. This code should be removed to avoid confusion and reduce maintenance overhead.

Comment on lines +2372 to +2387
// Clear existing rules
for (auto rule : mcp_query_rules) {
if (rule->regex_engine) {
delete (re2::RE2*)rule->regex_engine;
}
free(rule->username);
free(rule->schemaname);
free(rule->tool_name);
free(rule->match_pattern);
free(rule->replace_pattern);
free(rule->error_msg);
free(rule->ok_msg);
free(rule->comment);
delete rule;
}
mcp_query_rules.clear();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of code for clearing existing rules is nearly identical to the cleanup logic in the Discovery_Schema destructor. Duplicating this complex memory management logic increases the risk of bugs and makes maintenance harder.

Consider extracting this cleanup logic into a private helper function (e.g., clear_mcp_query_rules()) that can be called from both the destructor and load_mcp_query_rules to avoid code duplication.

Comment on lines +2503 to +2507
if (rule->username) {
// For now, we don't have username in MCP context, skip if set
// TODO: Add username matching when available
continue;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current logic for username matching will cause any rule with a username specified to be skipped, as the code immediately continues if rule->username is not null. The TODO comment indicates this might be intentional for now, but it effectively makes the username field in mcp_query_rules non-functional. This could be confusing for users trying to create rules based on usernames.

Comment on lines +2971 to +2972
// Input: {"filters": {"status": "active", "age": 25}}
// Output: {"filters":{"?":"?","?":"?"}}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The example in the documentation for fingerprint_mcp_args is incorrect and does not match the implementation. The implementation preserves object keys, while the example suggests keys are also replaced with ?.

The example:
Input: {"filters": {"status": "active", "age": 25}}
Output: {"filters":{"?":"?","?":"?"}}

Based on the implementation, the output should be:
Output: {"filters":{"status":"?","age":"?"}}

Please update the comment to accurately reflect the function's behavior.

Comment on lines +2634 to +2645
char* a = (char*)"INSERT INTO stats_mcp_query_rules VALUES (\"%s\",\"%s\")";
for (std::vector<SQLite3_row*>::iterator it = resultset->rows.begin(); it != resultset->rows.end(); ++it) {
SQLite3_row* r = *it;
int arg_len = 0;
for (int i = 0; i < 2; i++) {
arg_len += strlen(r->fields[i]);
}
char* query = (char*)malloc(strlen(a) + arg_len + 32);
sprintf(query, a, r->fields[0], r->fields[1]);
statsdb->execute(query);
free(query);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using sprintf to construct SQL queries is generally unsafe and can lead to SQL injection vulnerabilities, even if the inputs seem safe. It's better to use prepared statements to handle parameter binding securely.

While in this specific case the inputs are generated from integers, adopting prepared statements as a standard practice will make the code more robust and secure against future changes.

@renecannao renecannao changed the title feat: Add MCP query rules and digest statistics feat: Add MCP query rules and digest statistics [WIP] Jan 19, 2026
@renecannao renecannao requested a review from Copilot January 20, 2026 09:01
@renecannao
Copy link
Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Jan 20, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request implements firewall functionality and query digest statistics for MCP (Model Context Protocol) queries in ProxySQL, mirroring the existing MySQL query rules and statistics infrastructure.

Changes:

  • Adds MCP query rules tables (mcp_query_rules, runtime_mcp_query_rules) with 17 columns for defining firewall rules
  • Implements query digest computation and statistics tracking (stats_mcp_query_digest, stats_mcp_query_rules)
  • Adds LOAD/SAVE commands for MCP query rules following ProxySQL patterns
  • Includes comprehensive documentation for Static Harvester, Query Tool Handler, and catalog isolation features

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 17 comments.

Show a summary per file
File Description
lib/Static_Harvester.cpp Adds comprehensive documentation for all static harvester functions and workflow
lib/Query_Tool_Handler.cpp Implements MCP query rules evaluation and digest tracking integration into query execution flow
lib/ProxySQL_Admin_Stats.cpp Adds stats collection functions for MCP query digest and rules (incomplete implementation)
lib/ProxySQL_Admin.cpp Implements LOAD/SAVE commands for MCP query rules with proper escaping
lib/MySQL_Catalog.cpp Documents schema isolation for multi-tenancy support
lib/Discovery_Schema.cpp Core implementation of MCP query rules evaluation, digest computation, and statistics (has critical schema issues)
lib/Admin_Handler.cpp Adds command parsing for MCP query rules LOAD/SAVE operations
lib/Admin_Bootstrap.cpp Registers MCP query rules and stats tables in admin and config databases
include/proxysql_admin.h Declares MCP query rules load/save and stats functions
include/ProxySQL_Admin_Tables_Definitions.h Defines table schemas for MCP query rules and digest statistics
include/Discovery_Schema.h Declares MCP query rule structures, processor output, and digest functions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

SQLite3_result* Discovery_Schema::get_mcp_query_digest(bool reset) {
SQLite3_result* result = new SQLite3_result();

// Define columns (10 columns, not 11 - digest_text was duplicated)
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment at line 2839 states "10 columns, not 11 - digest_text was duplicated", but counting the add_column_definition calls from lines 2840-2849, there are exactly 10 columns as expected. The comment is confusing because it suggests there was a duplication issue that was fixed, but this should be clarified or removed if it's no longer relevant.

Suggested change
// Define columns (10 columns, not 11 - digest_text was duplicated)
// Define columns for MCP query digest statistics

Copilot uses AI. Check for mistakes.
Comment on lines 2581 to 2602
void ProxySQL_Admin::stats___mcp_query_digest(bool reset) {
if (!GloMCPH) return;
Query_Tool_Handler* qth = GloMCPH->query_tool_handler;
if (!qth) return;

// Get the discovery schema catalog
// Note: This is a simplified implementation that queries the catalog database
// In a full implementation, we would access the Discovery_Schema directly
statsdb->execute("BEGIN");

if (reset) {
statsdb->execute("DELETE FROM stats_mcp_query_digest_reset");
} else {
statsdb->execute("DELETE FROM stats_mcp_query_digest");
}

// For now, we'll leave the table empty since MCP digest stats are stored in memory
// in the Discovery_Schema and would need to be accessed differently
// TODO: Implement proper access to Discovery_Schema digest statistics

statsdb->execute("COMMIT");
}
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function signature and documentation describe a parameter 'reset' for stats___mcp_query_digest, but the implementation is incomplete. The TODO comment at line 2599 indicates that the digest statistics access needs proper implementation. The function currently only deletes rows from the table but doesn't populate it with actual data from the Discovery_Schema. This will result in empty stats tables when queried.

Copilot uses AI. Check for mistakes.
Comment on lines +2634 to +2645
char* a = (char*)"INSERT INTO stats_mcp_query_rules VALUES (\"%s\",\"%s\")";
for (std::vector<SQLite3_row*>::iterator it = resultset->rows.begin(); it != resultset->rows.end(); ++it) {
SQLite3_row* r = *it;
int arg_len = 0;
for (int i = 0; i < 2; i++) {
arg_len += strlen(r->fields[i]);
}
char* query = (char*)malloc(strlen(a) + arg_len + 32);
sprintf(query, a, r->fields[0], r->fields[1]);
statsdb->execute(query);
free(query);
}
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory leak potential: The result set returned from get_stats_mcp_query_rules is deleted at line 2647, but if the function returns early at lines 2582, 2584, or 2629, the resultset would not be cleaned up. However, reviewing the code more carefully, the resultset is only created inside the get_stats_mcp_query_rules call at line 2628, so early returns before that are safe. But there's still a potential issue - if any of the malloc/sprintf/execute operations fail between lines 2634-2645, the function should handle errors gracefully.

Copilot uses AI. Check for mistakes.
rule->tool_name = row->fields[4] ? strdup(row->fields[4]) : NULL; // tool_name
rule->match_pattern = row->fields[5] ? strdup(row->fields[5]) : NULL; // match_pattern
rule->negate_match_pattern = row->fields[6] ? atoi(row->fields[6]) != 0 : false; // negate_match_pattern
rule->re_modifiers = row->fields[7] ? atoi(row->fields[7]) : 1; // default CASELESS
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type mismatch for re_modifiers: The table definition stores re_modifiers as VARCHAR DEFAULT 'CASELESS' (line 342 in ProxySQL_Admin_Tables_Definitions.h and line 359 in Discovery_Schema.cpp), but the loading code at line 2404 attempts to parse it as an integer using atoi(). This will fail - atoi('CASELESS') returns 0, not the expected bitmask value. The code expects re_modifiers to be an integer bitmask (1=CASELESS per line 2420), but the schema defines it as a string. Either change the schema to INT or update the loading logic to parse string values like 'CASELESS' to the corresponding bitmask.

Copilot uses AI. Check for mistakes.
Comment on lines 1602 to 1644
MCP_Query_Processor_Output* qpo = catalog->evaluate_mcp_query_rules(
tool_name,
schema,
arguments,
sql
);

// Check for OK_msg (return success without executing)
if (qpo->OK_msg) {
unsigned long long duration = monotonic_time() - start_time;
track_tool_invocation(this, tool_name, schema, duration);
catalog->log_query_tool_call(tool_name, schema, 0, start_time, duration, "OK message from query rule");
result = create_success_response(qpo->OK_msg);
delete qpo;
return result;
}

// Check for error_msg (block the query)
if (qpo->error_msg) {
unsigned long long duration = monotonic_time() - start_time;
track_tool_invocation(this, tool_name, schema, duration);
catalog->log_query_tool_call(tool_name, schema, 0, start_time, duration, "Blocked by query rule");
result = create_error_response(qpo->error_msg);
delete qpo;
return result;
}

// Apply rewritten query if provided
if (qpo->new_query) {
sql = *qpo->new_query;
}

// Apply timeout if provided
if (qpo->timeout_ms > 0) {
timeout_sec = qpo->timeout_ms / 1000;
}

// Apply log flag if set
if (qpo->log == 1) {
// TODO: Implement query logging if needed
}

delete qpo;
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory leak: The qpo object allocated at line 1602 is deleted at lines 1615, 1625, and 1644, but if execution continues past line 1644 and an exception is thrown in the query execution code (lines 1647-1693), the qpo object will not be deleted, causing a memory leak. Consider using RAII (std::unique_ptr) or ensuring cleanup in all code paths.

Copilot uses AI. Check for mistakes.
Comment on lines +7789 to +7793
if (success == false) {
if (resultset) {
free(resultset);
}
}
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential memory leak in error path: If resultset allocation succeeds but the execute_statement fails (returning an error), the function returns NULL at line 7789 without freeing the resultset. The condition at line 7789-7792 attempts to free the resultset only when success is false, but the resultset pointer may already be allocated by execute_statement even on error. The cleanup logic should verify whether resultset was allocated before freeing it.

Copilot uses AI. Check for mistakes.
Comment on lines +2800 to +2804
// Periodically persist to SQLite (every 100 updates or so)
static thread_local unsigned int update_count = 0;
if (++update_count % 100 == 0) {
// TODO: Implement batch persistence
}
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thread safety issue: The static thread_local variable update_count at line 2801 is incremented without any synchronization. While thread_local ensures each thread has its own copy, if the same thread processes multiple queries concurrently (which is unlikely but possible in some threading models), this could lead to race conditions. However, since the persistence is only a TODO at line 2803, this is not currently a functional issue but should be addressed when implementing batch persistence.

Copilot uses AI. Check for mistakes.
Comment on lines 1634 to 1637
// Apply timeout if provided
if (qpo->timeout_ms > 0) {
timeout_sec = qpo->timeout_ms / 1000;
}
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout_sec variable is declared but never used after being set at line 1636. The variable appears to be intended for future query timeout enforcement but is not actually applied to the query execution. This could mislead developers into thinking timeout enforcement is active when it's not.

Suggested change
// Apply timeout if provided
if (qpo->timeout_ms > 0) {
timeout_sec = qpo->timeout_ms / 1000;
}
// Note: qpo->timeout_ms is currently not enforced. Timeout handling
// should be implemented where queries are actually executed.

Copilot uses AI. Check for mistakes.
// Output: {"sql":"?","timeout":"?"}
//
// Input: {"filters": {"status": "active", "age": 25}}
// Output: {"filters":{"?":"?","?":"?"}}
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment at line 2971 shows an example output {"filters":{"?":"?","?":"?"}} which incorrectly suggests that object keys are replaced with "?". However, the code at line 2988 preserves the key using it.key(), so the correct output should be {"filters":{"status":"?","age":"?"}}. The documentation should be corrected to accurately reflect the fingerprinting behavior.

Suggested change
// Output: {"filters":{"?":"?","?":"?"}}
// Output: {"filters":{"status":"?","age":"?"}}

Copilot uses AI. Check for mistakes.

// Check schemaname match
if (rule->schemaname) {
if (!schemaname.empty() && strcmp(rule->schemaname, schemaname.c_str()) != 0) {
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition at line 2511 checks if schemaname is not empty before comparing with rule->schemaname, but it doesn't handle the case where rule->schemaname is NULL and schemaname is empty. If rule->schemaname is NULL, it should match an empty schemaname parameter (meaning "match all schemas"). The current logic will skip the rule if schemaname is empty but rule->schemaname is not NULL, which may not be the intended behavior. Consider: if rule->schemaname is NULL, it should match any schema; if rule->schemaname is not NULL and schemaname is not empty, only match if they're equal.

Suggested change
if (!schemaname.empty() && strcmp(rule->schemaname, schemaname.c_str()) != 0) {
// Rules with an explicit schemaname only apply when a matching
// non-empty schemaname is provided.
if (schemaname.empty() || strcmp(rule->schemaname, schemaname.c_str()) != 0) {

Copilot uses AI. Check for mistakes.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)
lib/MySQL_Catalog.cpp (3)

278-301: SQL injection vulnerability: user input concatenated directly into query.

The schema, kind, tags, and query parameters are concatenated directly into the SQL string without sanitization or parameterized binding. An attacker could inject malicious SQL.

🔒 Recommended fix using parameterized queries
 std::string MySQL_Catalog::search(
 	const std::string& schema,
 	const std::string& query,
 	const std::string& kind,
 	const std::string& tags,
 	int limit,
 	int offset
 ) {
-	std::ostringstream sql;
-	sql << "SELECT schema, kind, key, document, tags ,  links FROM catalog WHERE 1=1";
-
-	// Add schema filter
-	if (!schema.empty()) {
-		sql << " AND schema = '" << schema << "'";
-	}
-
-	// Add kind filter
-	if (!kind.empty()) {
-		sql << " AND kind = '" << kind << "'";
-	}
-
-	// Add tags filter
-	if (!tags.empty()) {
-		sql << " AND tags LIKE '%" << tags << "%'";
-	}
-
-	// Add search query
-	if (!query.empty()) {
-		sql << " AND (key LIKE '%" << query << "%' "
-		     << "OR document LIKE '%" << query << "%' "
-		     << "OR tags LIKE '%" << query << "%')";
-	}
+	// Use parameterized query to prevent SQL injection
+	sqlite3_stmt* stmt = NULL;
+	std::string sql = "SELECT schema, kind, key, document, tags, links FROM catalog WHERE 1=1";
+	
+	std::vector<std::string> bindings;
+	if (!schema.empty()) {
+		sql += " AND schema = ?";
+		bindings.push_back(schema);
+	}
+	if (!kind.empty()) {
+		sql += " AND kind = ?";
+		bindings.push_back(kind);
+	}
+	// ... continue with parameterized bindings for tags and query

368-390: SQL injection vulnerability in list() method.

Same issue as search()schema and kind are concatenated directly into SQL without sanitization.


505-511: SQL injection vulnerability in remove() method.

The schema, kind, and key parameters are directly concatenated into the DELETE statement.

🔒 Recommended fix
 int MySQL_Catalog::remove(
 	const std::string& schema,
 	const std::string& kind,
 	const std::string& key
 ) {
-	std::ostringstream sql;
-	sql << "DELETE FROM catalog WHERE 1=1";
-
-	if (!schema.empty()) {
-		sql << " AND schema = '" << schema << "'";
-	}
-	sql << " AND kind = '" << kind << "' AND key = '" << key << "'";
-
-	if (!db->execute(sql.str().c_str())) {
+	sqlite3_stmt* stmt = NULL;
+	const char* delete_sql = schema.empty() 
+		? "DELETE FROM catalog WHERE kind = ?1 AND key = ?2"
+		: "DELETE FROM catalog WHERE schema = ?1 AND kind = ?2 AND key = ?3";
+	
+	int rc = db->prepare_v2(delete_sql, &stmt);
+	if (rc != SQLITE_OK) {
+		proxy_error("Failed to prepare catalog remove: %d\n", rc);
+		return -1;
+	}
+	
+	int idx = 1;
+	if (!schema.empty()) {
+		(*proxy_sqlite3_bind_text)(stmt, idx++, schema.c_str(), -1, SQLITE_TRANSIENT);
+	}
+	(*proxy_sqlite3_bind_text)(stmt, idx++, kind.c_str(), -1, SQLITE_TRANSIENT);
+	(*proxy_sqlite3_bind_text)(stmt, idx++, key.c_str(), -1, SQLITE_TRANSIENT);
+	
+	SAFE_SQLITE3_STEP2(stmt);
+	(*proxy_sqlite3_finalize)(stmt);
+	
+	return 0;
lib/Static_Harvester.cpp (3)

398-423: SQL injection in fetch_schemas; use parameterized query.

The filter parameter is concatenated directly into the SQL query. While this may only be called internally, using parameterized queries is a best practice for defense-in-depth.

🔒 Suggested fix
 std::vector<Static_Harvester::SchemaRow> Static_Harvester::fetch_schemas(const std::string& filter) {
 	std::vector<SchemaRow> schemas;
 
-	std::ostringstream sql;
-	sql << "SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME "
-	    << "FROM information_schema.SCHEMATA";
-
-	if (!filter.empty()) {
-		sql << " WHERE SCHEMA_NAME = '" << filter << "'";
-	}
-
-	sql << " ORDER BY SCHEMA_NAME;";
+	std::string sql = "SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME "
+	    "FROM information_schema.SCHEMATA";
+	
+	if (!filter.empty()) {
+		// Escape single quotes in filter to prevent injection
+		std::string escaped_filter;
+		for (char c : filter) {
+			if (c == '\'') escaped_filter += "''";
+			else escaped_filter += c;
+		}
+		sql += " WHERE SCHEMA_NAME = '" + escaped_filter + "'";
+	}
+	
+	sql += " ORDER BY SCHEMA_NAME;";

Or better, use MySQL prepared statements via mysql_stmt_* APIs.


479-513: Same SQL injection pattern in fetch_tables_views.

The filter parameter is directly concatenated. Apply the same fix as fetch_schemas.


1036-1055: SQL injection in harvest_view_definitions via view_def.

Line 1045 inserts view_def directly into an UPDATE statement. View definitions from MySQL could contain single quotes or malicious content.

🔒 Use parameterized update
-		std::ostringstream update_sql;
-		update_sql << "UPDATE objects SET definition_sql = '" << view_def << "' "
-		           << "WHERE run_id = " << current_run_id
-		           << " AND schema_name = '" << schema_name << "'"
-		           << " AND object_name = '" << view_name << "'"
-		           << " AND object_type = 'view';";
-
-		catalog->get_db()->execute_statement(update_sql.str().c_str(), &error, &cols, &affected);
+		sqlite3_stmt* stmt = NULL;
+		const char* update_sql = "UPDATE objects SET definition_sql = ?1 "
+			"WHERE run_id = ?2 AND schema_name = ?3 AND object_name = ?4 AND object_type = 'view'";
+		
+		int rc = catalog->get_db()->prepare_v2(update_sql, &stmt);
+		if (rc == SQLITE_OK) {
+			(*proxy_sqlite3_bind_text)(stmt, 1, view_def.c_str(), -1, SQLITE_TRANSIENT);
+			(*proxy_sqlite3_bind_int)(stmt, 2, current_run_id);
+			(*proxy_sqlite3_bind_text)(stmt, 3, schema_name.c_str(), -1, SQLITE_TRANSIENT);
+			(*proxy_sqlite3_bind_text)(stmt, 4, view_name.c_str(), -1, SQLITE_TRANSIENT);
+			SAFE_SQLITE3_STEP2(stmt);
+			(*proxy_sqlite3_finalize)(stmt);
+		}
lib/Discovery_Schema.cpp (2)

1256-1267: SQL injection in fts_search via query parameter.

The query, object_type, and schema_name parameters are directly concatenated into the FTS MATCH clause and WHERE conditions.

🔒 Escape or validate FTS query input

FTS5 MATCH queries require special handling. At minimum, escape single quotes:

-	sql << "SELECT object_key, schema_name, object_name, object_type, tags ,  bm25(fts_objects) AS score "
-	    << "FROM fts_objects WHERE fts_objects MATCH '" << query << "'";
+	// Escape single quotes for FTS query
+	std::string escaped_query;
+	for (char c : query) {
+		if (c == '\'') escaped_query += "''";
+		else escaped_query += c;
+	}
+	sql << "SELECT object_key, schema_name, object_name, object_type, tags, bm25(fts_objects) AS score "
+	    << "FROM fts_objects WHERE fts_objects MATCH '" << escaped_query << "'";

2075-2087: SQL injection in fts_search_llm.

Same issue — the query parameter is directly concatenated into the MATCH clause.

🤖 Fix all issues with AI agents
In `@lib/Admin_Handler.cpp`:
- Around line 2362-2383: The current INSERT OR REPLACE SELECT statements in the
LOAD and SAVE branches (where *q is set via l_strdup to "INSERT OR REPLACE INTO
main.mcp_query_rules SELECT * FROM disk.mcp_query_rules" and "INSERT OR REPLACE
INTO disk.mcp_query_rules SELECT * FROM main.mcp_query_rules") do not remove
source-deleted rows and can resurrect stale rules; modify the SQL assigned to *q
in both branches so the target table is cleared before copying (e.g., issue a
DELETE FROM <target_table> or use a transactional sequence "DELETE FROM
<target_table>; INSERT INTO <target_table> SELECT ...") while keeping the same
surrounding logic (the blocks that test query_no_space_length and call l_free,
l_strdup, set *ql and return true) and ensure both the LOAD (target
main.mcp_query_rules) and SAVE (target disk.mcp_query_rules) cases are updated.

In `@lib/Discovery_Schema.cpp`:
- Around line 2655-2684: The code allocates pta with malloc and many strdup
calls (symbols: pta, malloc, strdup, result->add_row) without checking for NULL;
add checks after malloc and after each strdup (or use a helper that allocates
and duplicates safely) so that on allocation failure you free any
already-allocated entries, free pta, and avoid calling result->add_row (return
an error or handle gracefully). Specifically: verify pta != NULL immediately
after malloc, check each strdup result before storing it into pta (or wrap
duplication to return NULL on failure and propagate), on any allocation failure
free any non-NULL pta[i] and pta itself, and only call result->add_row when all
required allocations succeeded.
- Around line 2851-2895: The code has a TOCTOU race: it unlocks the read lock
(pthread_rwlock_rdlock / pthread_rwlock_unlock) then later acquires the write
lock (pthread_rwlock_wrlock) when reset is true, allowing concurrent
modification of mcp_digest_umap; change the locking so that if reset is true you
acquire the write lock before iterating (i.e., call pthread_rwlock_wrlock
instead of pthread_rwlock_rdlock at the start when reset is true) and hold that
lock for both building rows and clearing/deleting entries (use mcp_digest_umap,
MCP_Query_Digest_Stats, result->add_row references), otherwise keep the current
read-lock path—ensure you do not double-lock and that the unlock used matches
the lock acquired.

In `@lib/ProxySQL_Admin.cpp`:
- Around line 1358-1363: The bug is that stats_mcp_query_digest_reset contains
stats_mcp_query_digest so both flags get set; update the checks in the block
that reads query_no_space so the reset path is handled exclusively (mirror
MySQL/PGSQL): test for "stats_mcp_query_digest_reset" before testing for
"stats_mcp_query_digest" (or use a precise token/boundary match) so only
stats_mcp_query_digest_reset becomes true and refresh is set once; adjust the
logic around the stats_mcp_query_digest_reset and stats_mcp_query_digest flag
assignments to avoid double-triggering the reset and non-reset handling.

In `@lib/Query_Tool_Handler.cpp`:
- Around line 1635-1637: The conversion of qpo->timeout_ms to timeout_sec uses
integer division which truncates sub-second values to zero (e.g., 500ms => 0);
change the conversion logic in Query_Tool_Handler.cpp around the timeout
handling (variable timeout_sec and qpo->timeout_ms) to round up instead of
truncating so any non-zero millisecond timeout becomes at least 1 second (e.g.,
replace the direct /1000 integer division with a ceil-style conversion such as
(timeout_ms + 999)/1000 or use floating-point division plus ceil) and keep
existing behavior when timeout_ms is 0 or negative.
♻️ Duplicate comments (4)
lib/ProxySQL_Admin_Stats.cpp (2)

2581-2602: Implementation remains incomplete as previously noted.

The function deletes existing rows but never inserts new data, leaving the table permanently empty. The TODO comment at lines 2597-2599 acknowledges this. The digest statistics are stored in memory in the Discovery_Schema and need to be accessed via get_mcp_query_digest() to complete this implementation.


2634-2645: The sprintf pattern follows existing codebase conventions.

This implementation mirrors the pattern used in stats___mysql_query_rules() (lines 1385-1396) and stats___pgsql_query_rules() (lines 1407-1418). While prepared statements would be safer as previously noted, this is consistent with the established codebase style. The input data (rule_id and hits) comes from the catalog's internal integer counters, limiting injection risk in practice.

lib/Discovery_Schema.cpp (2)

2372-2387: Duplicate cleanup code — consider extracting helper.

This cleanup logic is nearly identical to the destructor code at lines 34-48. Extracting a private helper method (e.g., clear_mcp_query_rules()) would reduce duplication and maintenance burden.


2503-2507: Username matching effectively disabled.

The current logic skips any rule with a username set, making the username field non-functional. The TODO indicates this is intentional for now, but users may be confused when their username-based rules don't work.

🧹 Nitpick comments (7)
lib/Admin_Bootstrap.cpp (1)

911-914: Minor: Clarify the comment on line 914.

The comment "Reuse same schema for stats" is slightly misleading. The stats_mcp_query_rules table has a different, simpler schema (just rule_id and hits) compared to the full mcp_query_rules schema. Consider updating the comment to better reflect this.

📝 Suggested comment improvement
 	// MCP query digest stats
 	insert_into_tables_defs(tables_defs_stats,"stats_mcp_query_digest", STATS_SQLITE_TABLE_MCP_QUERY_DIGEST);
 	insert_into_tables_defs(tables_defs_stats,"stats_mcp_query_digest_reset", STATS_SQLITE_TABLE_MCP_QUERY_DIGEST_RESET);
-	insert_into_tables_defs(tables_defs_stats,"stats_mcp_query_rules", STATS_SQLITE_TABLE_MCP_QUERY_RULES); // Reuse same schema for stats
+	insert_into_tables_defs(tables_defs_stats,"stats_mcp_query_rules", STATS_SQLITE_TABLE_MCP_QUERY_RULES);
lib/MySQL_Catalog.cpp (2)

53-66: VLA is non-standard C++; prefer stack-allocated fixed buffer or std::vector.

Line 56 uses a variable-length array (VLA) which is a GCC extension and not standard C++. This can cause stack overflow for large paths.

♻️ Suggested fix
 int MySQL_Catalog::init() {
 	// Initialize database connection
 	db = new SQLite3DB();
-	char path_buf[db_path.size() + 1];
-	strcpy(path_buf, db_path.c_str());
-	int rc = db->open(path_buf, SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE);
+	int rc = db->open((char*)db_path.c_str(), SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE);
 	if (rc != SQLITE_OK) {

476-488: Merge uses hardcoded empty schema and fixed kinds; document or parameterize.

The merge operation hardcodes empty schema "" and only searches for "table" and "view" kinds. This may limit flexibility for future use cases.

include/Discovery_Schema.h (2)

93-132: Mixed ownership semantics in MCP_Query_Processor_Output.

new_query uses new/delete while error_msg and OK_msg use strdup/free. This inconsistency increases cognitive load and bug risk. Consider unifying ownership using std::string or std::unique_ptr<std::string> for all string members.

♻️ Suggested modernization
 struct MCP_Query_Processor_Output {
-	std::string *new_query;   // Rewritten query (caller must delete)
+	std::unique_ptr<std::string> new_query;   // Rewritten query
 	int timeout_ms;           // Query timeout in milliseconds (-1 = not set)
-	char *error_msg;          // Error message to return (NULL = not set)
-	char *OK_msg;             // OK message to return (NULL = not set)
+	std::string error_msg;    // Error message (empty = not set)
+	std::string OK_msg;       // OK message (empty = not set)
 	int log;                  // Whether to log this query (-1 = not set, 0 = no, 1 = yes)
 	int next_query_flagIN;    // Flag for next query (-1 = not set)

149-156: Consider using typed pointer instead of void for digest stats map.*

The mcp_digest_umap stores void* which loses type safety. Since the value type is always MCP_Query_Digest_Stats*, using the explicit type would prevent casting errors.

♻️ Suggested improvement
 	// MCP query digest statistics
-	std::unordered_map<std::string, std::unordered_map<uint64_t, void*>> mcp_digest_umap;
+	std::unordered_map<std::string, std::unordered_map<uint64_t, MCP_Query_Digest_Stats*>> mcp_digest_umap;
lib/Static_Harvester.cpp (1)

150-154: Hardcoded 30-second timeout; consider making configurable.

The connection timeouts are hardcoded. For environments with varying network latencies, this might be too short or unnecessarily long.

lib/Discovery_Schema.cpp (1)

62-75: VLA usage; same issue as MySQL_Catalog.

Line 65 uses a variable-length array. Apply the same fix as recommended for MySQL_Catalog.

Comment on lines 2362 to 2383
// LOAD MCP QUERY RULES FROM DISK / TO MEMORY
// Copies rules from persistent storage (disk.mcp_query_rules) to working memory (main.mcp_query_rules)
if (
(query_no_space_length == strlen("LOAD MCP QUERY RULES FROM DISK") && !strncasecmp("LOAD MCP QUERY RULES FROM DISK", query_no_space, query_no_space_length))
||
(query_no_space_length == strlen("LOAD MCP QUERY RULES TO MEMORY") && !strncasecmp("LOAD MCP QUERY RULES TO MEMORY", query_no_space, query_no_space_length))
) {
l_free(*ql,*q);
*q=l_strdup("INSERT OR REPLACE INTO main.mcp_query_rules SELECT * FROM disk.mcp_query_rules");
*ql=strlen(*q)+1;
return true;
}

// SAVE MCP QUERY RULES TO DISK
// Copies rules from working memory (main.mcp_query_rules) to persistent storage (disk.mcp_query_rules)
if (
(query_no_space_length == strlen("SAVE MCP QUERY RULES TO DISK") && !strncasecmp("SAVE MCP QUERY RULES TO DISK", query_no_space, query_no_space_length))
) {
l_free(*ql,*q);
*q=l_strdup("INSERT OR REPLACE INTO disk.mcp_query_rules SELECT * FROM main.mcp_query_rules");
*ql=strlen(*q)+1;
return true;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Prevent stale MCP query rules when syncing disk ↔ memory.

INSERT OR REPLACE ... SELECT doesn’t remove rows deleted on the source side, so obsolete rules can persist and later be re-activated on runtime loads. Clear the target table before copying (or use a dedicated flush helper) to keep disk/memory in sync.

🧩 Proposed fix (clear target before copy)
-			l_free(*ql,*q);
-			*q=l_strdup("INSERT OR REPLACE INTO main.mcp_query_rules SELECT * FROM disk.mcp_query_rules");
-			*ql=strlen(*q)+1;
-			return true;
+			ProxySQL_Admin* SPA = (ProxySQL_Admin*)pa;
+			SPA->admindb->execute("DELETE FROM main.mcp_query_rules");
+			SPA->admindb->execute("INSERT OR REPLACE INTO main.mcp_query_rules SELECT * FROM disk.mcp_query_rules");
+			SPA->send_ok_msg_to_client(sess, NULL, 0, query_no_space);
+			return false;
-			l_free(*ql,*q);
-			*q=l_strdup("INSERT OR REPLACE INTO disk.mcp_query_rules SELECT * FROM main.mcp_query_rules");
-			*ql=strlen(*q)+1;
-			return true;
+			ProxySQL_Admin* SPA = (ProxySQL_Admin*)pa;
+			SPA->admindb->execute("DELETE FROM disk.mcp_query_rules");
+			SPA->admindb->execute("INSERT OR REPLACE INTO disk.mcp_query_rules SELECT * FROM main.mcp_query_rules");
+			SPA->send_ok_msg_to_client(sess, NULL, 0, query_no_space);
+			return false;
🤖 Prompt for AI Agents
In `@lib/Admin_Handler.cpp` around lines 2362 - 2383, The current INSERT OR
REPLACE SELECT statements in the LOAD and SAVE branches (where *q is set via
l_strdup to "INSERT OR REPLACE INTO main.mcp_query_rules SELECT * FROM
disk.mcp_query_rules" and "INSERT OR REPLACE INTO disk.mcp_query_rules SELECT *
FROM main.mcp_query_rules") do not remove source-deleted rows and can resurrect
stale rules; modify the SQL assigned to *q in both branches so the target table
is cleared before copying (e.g., issue a DELETE FROM <target_table> or use a
transactional sequence "DELETE FROM <target_table>; INSERT INTO <target_table>
SELECT ...") while keeping the same surrounding logic (the blocks that test
query_no_space_length and call l_free, l_strdup, set *ql and return true) and
ensure both the LOAD (target main.mcp_query_rules) and SAVE (target
disk.mcp_query_rules) cases are updated.

Comment on lines +2655 to +2684
char** pta = (char**)malloc(sizeof(char*) * 17);

pta[0] = strdup(std::to_string(rule->rule_id).c_str()); // rule_id
pta[1] = strdup(std::to_string(rule->active ? 1 : 0).c_str()); // active
pta[2] = rule->username ? strdup(rule->username) : NULL; // username
pta[3] = rule->schemaname ? strdup(rule->schemaname) : NULL; // schemaname
pta[4] = rule->tool_name ? strdup(rule->tool_name) : NULL; // tool_name
pta[5] = rule->match_pattern ? strdup(rule->match_pattern) : NULL; // match_pattern
pta[6] = strdup(std::to_string(rule->negate_match_pattern ? 1 : 0).c_str()); // negate_match_pattern
pta[7] = strdup(std::to_string(rule->re_modifiers).c_str()); // re_modifiers
pta[8] = strdup(std::to_string(rule->flagIN).c_str()); // flagIN
pta[9] = strdup(std::to_string(rule->flagOUT).c_str()); // flagOUT
pta[10] = rule->replace_pattern ? strdup(rule->replace_pattern) : NULL; // replace_pattern
pta[11] = strdup(std::to_string(rule->timeout_ms).c_str()); // timeout_ms
pta[12] = rule->error_msg ? strdup(rule->error_msg) : NULL; // error_msg
pta[13] = rule->ok_msg ? strdup(rule->ok_msg) : NULL; // OK_msg
pta[14] = strdup(std::to_string(rule->log ? 1 : 0).c_str()); // log
pta[15] = strdup(std::to_string(rule->apply ? 1 : 0).c_str()); // apply
pta[16] = rule->comment ? strdup(rule->comment) : NULL; // comment

result->add_row(pta);

// Free the row data
for (int j = 0; j < 17; j++) {
if (pta[j]) {
free(pta[j]);
}
}
free(pta);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

No null checks after malloc/strdup allocations.

If malloc or strdup returns NULL due to memory exhaustion, the subsequent operations will cause a crash. While rare, this is flagged by static analysis.

Consider adding null checks or using a wrapper that handles allocation failures consistently across the codebase.

🧰 Tools
🪛 Cppcheck (2.19.0)

[warning] 2657-2657: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)


[warning] 2658-2658: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)


[warning] 2659-2659: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)


[warning] 2660-2660: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)


[warning] 2661-2661: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)


[warning] 2662-2662: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)


[warning] 2663-2663: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)


[warning] 2664-2664: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)


[warning] 2665-2665: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)


[warning] 2666-2666: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)


[warning] 2667-2667: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)


[warning] 2668-2668: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)


[warning] 2669-2669: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)


[warning] 2670-2670: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)


[warning] 2671-2671: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)


[warning] 2672-2672: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)


[warning] 2673-2673: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)


[warning] 2679-2679: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)

🤖 Prompt for AI Agents
In `@lib/Discovery_Schema.cpp` around lines 2655 - 2684, The code allocates pta
with malloc and many strdup calls (symbols: pta, malloc, strdup,
result->add_row) without checking for NULL; add checks after malloc and after
each strdup (or use a helper that allocates and duplicates safely) so that on
allocation failure you free any already-allocated entries, free pta, and avoid
calling result->add_row (return an error or handle gracefully). Specifically:
verify pta != NULL immediately after malloc, check each strdup result before
storing it into pta (or wrap duplication to return NULL on failure and
propagate), on any allocation failure free any non-NULL pta[i] and pta itself,
and only call result->add_row when all required allocations succeeded.

Comment on lines 1635 to 1637
if (qpo->timeout_ms > 0) {
timeout_sec = qpo->timeout_ms / 1000;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Integer division truncates sub-second timeouts to zero.

The conversion qpo->timeout_ms / 1000 truncates values less than 1000ms to 0 seconds. If a rule specifies a 500ms timeout, timeout_sec becomes 0, which may cause unexpected behavior depending on how zero is handled downstream.

Consider rounding up to ensure sub-second timeouts are at least 1 second:

Suggested fix
 			// Apply timeout if provided
 			if (qpo->timeout_ms > 0) {
-				timeout_sec = qpo->timeout_ms / 1000;
+				timeout_sec = (qpo->timeout_ms + 999) / 1000;  // Round up to at least 1 sec
 			}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (qpo->timeout_ms > 0) {
timeout_sec = qpo->timeout_ms / 1000;
}
if (qpo->timeout_ms > 0) {
timeout_sec = (qpo->timeout_ms + 999) / 1000; // Round up to at least 1 sec
}
🤖 Prompt for AI Agents
In `@lib/Query_Tool_Handler.cpp` around lines 1635 - 1637, The conversion of
qpo->timeout_ms to timeout_sec uses integer division which truncates sub-second
values to zero (e.g., 500ms => 0); change the conversion logic in
Query_Tool_Handler.cpp around the timeout handling (variable timeout_sec and
qpo->timeout_ms) to round up instead of truncating so any non-zero millisecond
timeout becomes at least 1 second (e.g., replace the direct /1000 integer
division with a ceil-style conversion such as (timeout_ms + 999)/1000 or use
floating-point division plus ceil) and keep existing behavior when timeout_ms is
0 or negative.

Fixed 6 legitimate issues from AI code review:

Critical fixes:
- Implement stats___mcp_query_digest to populate table with actual data
- Fix double trigger bug (stats_mcp_query_digest_reset substring matching)

Important fixes:
- Fix re_modifiers parsing (VARCHAR "CASELESS" -> int 1)
- Fix TOCTOU race condition in reset path (use write lock from start)
- Add column count validation before accessing row fields

Documentation:
- Add memory ownership documentation for evaluate_mcp_query_rules

False positives ignored (Issue 2: Schema mismatch, Issue 6: SQL injection)
@renecannao
Copy link
Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Jan 20, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@lib/ProxySQL_Admin_Stats.cpp`:
- Around line 2605-2625: The current loop builds SQL via sprintf into query
(using template a) which allows quotes/JSON in digest_text (and other
r->fields[]) to break SQL or permit injection; replace this with a prepared
INSERT into stats_mcp_query_digest and bind the 10 values from
r->fields[0]..r->fields[9] for each SQLite3_row instead of sprintf/malloc.
Locate the loop using SQLite3_row* r and statsdb->execute and change it to
prepare a parameterized statement once (e.g., "INSERT INTO
stats_mcp_query_digest VALUES (?,?,?,?,?,?,?,?,?,?)"), bind each r->fields[i]
per row, execute/reset the statement for the next row, and remove the
sprintf/malloc usage to eliminate injection and quoting issues.
- Around line 2594-2606: The reset path deletes from
stats_mcp_query_digest_reset but the subsequent insert always writes into
stats_mcp_query_digest; update the insert target based on the reset flag so the
reset branch re-populates stats_mcp_query_digest_reset. Concretely, change how
the INSERT string (variable a) is chosen before the loop in the block that calls
statsdb->execute and iterates resultset->rows: if reset is true set a to an
INSERT INTO stats_mcp_query_digest_reset ... string, otherwise keep the INSERT
INTO stats_mcp_query_digest ... string, then use that chosen string inside the
insertion loop so inserts go to the correct table.

In `@lib/ProxySQL_Admin.cpp`:
- Around line 7789-7793: The cleanup code is calling free(resultset) on an
object returned by execute_statement (an instance of SQLite3_result), which is
undefined behavior; replace that call with the appropriate C++ delete (i.e.,
delete resultset) and ensure any other cleanup paths that free resultset are
updated similarly so SQLite3_result is destroyed properly (look for the
resultset variable and usages around execute_statement and the success==false
branch in ProxySQL_Admin.cpp).
♻️ Duplicate comments (8)
lib/ProxySQL_Admin_Stats.cpp (1)

2662-2673: Same concern as prior review: raw sprintf with unescaped fields.

This is the same SQL construction pattern that was previously flagged. Please use prepared statements or escape values to avoid injection/breakage.

lib/Discovery_Schema.cpp (7)

34-48: Duplicate cleanup logic with load_mcp_query_rules.

This cleanup block is nearly identical to lines 2373-2387 in load_mcp_query_rules. Consider extracting to a private helper function (e.g., clear_mcp_query_rules()) to eliminate duplication and reduce maintenance risk.


351-372: Schema mismatch between Discovery_Schema and admin tables.

This mcp_query_rules table schema differs from the admin table definitions (includes run_id, action, hits columns not present in admin tables). If rules are loaded from admin tables via load_mcp_query_rules, the column layouts won't align.

Verify whether this table is actually used or if it should be removed/synchronized with admin table definitions.


2529-2533: Username matching is non-functional.

Rules with username specified are always skipped since username context isn't available in MCP. This makes the username field in mcp_query_rules effectively unusable. The TODO should be tracked as a follow-up issue.


2632-2635: Redundant rule->apply check.

At line 2633, rule->apply is guaranteed to be true because log-only rules (!rule->apply) already continue at line 2575. The condition is logically redundant.

♻️ Suggested simplification
-			// If apply is true and not a log-only rule, stop processing further rules
-			if (rule->apply) {
-				break;
-			}
+			// Stop processing further rules (log-only rules already continued above)
+			break;

2865-2865: Confusing historical comment.

The comment mentions "digest_text was duplicated" which references a past bug fix rather than describing current state. Consider simplifying to just document the current column count.

📝 Suggested simplification
-	// Define columns (10 columns, not 11 - digest_text was duplicated)
+	// Define columns for MCP query digest statistics (10 columns)

2681-2710: No null checks after malloc/strdup allocations.

If malloc or strdup returns NULL due to memory exhaustion, subsequent operations will cause undefined behavior. While OOM is rare, the static analyzer flags this consistently across the function.

Consider adding a null check after malloc or using a wrapper that handles allocation failures consistently.


3000-3001: Incorrect example in documentation.

The example output {"filters":{"?":"?","?":"?"}} is wrong. The code at line 3017 uses it.key() which preserves object keys. The correct output should be:

Output: {"filters":{"status":"?","age":"?"}}
📝 Suggested fix
-//   Output: {"filters":{"?":"?","?":"?"}}
+//   Output: {"filters":{"status":"?","age":"?"}}

Comment on lines +2594 to +2606
statsdb->execute("BEGIN");

if (reset) {
statsdb->execute("DELETE FROM stats_mcp_query_digest_reset");
} else {
statsdb->execute("DELETE FROM stats_mcp_query_digest");
}

// Insert digest statistics into the stats table
// Columns: tool_name, run_id, digest, digest_text, count_star,
// first_seen, last_seen, sum_time, min_time, max_time
char* a = (char*)"INSERT INTO stats_mcp_query_digest VALUES (\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\")";
for (std::vector<SQLite3_row*>::iterator it = resultset->rows.begin(); it != resultset->rows.end(); ++it) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reset path inserts into the wrong table.

When reset is true you delete stats_mcp_query_digest_reset, but the insert always targets stats_mcp_query_digest. That leaves the reset table empty and overwrites the non‑reset view on reset reads.

🛠️ Suggested fix
-    char* a = (char*)"INSERT INTO stats_mcp_query_digest VALUES (\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\")";
+    const char* table = reset ? "stats_mcp_query_digest_reset" : "stats_mcp_query_digest";
+    std::string a = std::string("INSERT INTO ") + table +
+        " VALUES (\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\")";
...
-    char* query = (char*)malloc(strlen(a) + arg_len + 32);
-    sprintf(query, a,
+    char* query = (char*)malloc(a.size() + arg_len + 32);
+    sprintf(query, a.c_str(),
         r->fields[0], r->fields[1], r->fields[2], r->fields[3], r->fields[4],
         r->fields[5], r->fields[6], r->fields[7], r->fields[8], r->fields[9]);
🤖 Prompt for AI Agents
In `@lib/ProxySQL_Admin_Stats.cpp` around lines 2594 - 2606, The reset path
deletes from stats_mcp_query_digest_reset but the subsequent insert always
writes into stats_mcp_query_digest; update the insert target based on the reset
flag so the reset branch re-populates stats_mcp_query_digest_reset. Concretely,
change how the INSERT string (variable a) is chosen before the loop in the block
that calls statsdb->execute and iterates resultset->rows: if reset is true set a
to an INSERT INTO stats_mcp_query_digest_reset ... string, otherwise keep the
INSERT INTO stats_mcp_query_digest ... string, then use that chosen string
inside the insertion loop so inserts go to the correct table.

Comment on lines +2605 to +2625
char* a = (char*)"INSERT INTO stats_mcp_query_digest VALUES (\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\")";
for (std::vector<SQLite3_row*>::iterator it = resultset->rows.begin(); it != resultset->rows.end(); ++it) {
SQLite3_row* r = *it;
int arg_len = 0;
for (int i = 0; i < 10; i++) {
arg_len += strlen(r->fields[i]);
}
char* query = (char*)malloc(strlen(a) + arg_len + 32);
sprintf(query, a,
r->fields[0], // tool_name
r->fields[1], // run_id
r->fields[2], // digest
r->fields[3], // digest_text
r->fields[4], // count_star
r->fields[5], // first_seen
r->fields[6], // last_seen
r->fields[7], // sum_time
r->fields[8], // min_time
r->fields[9] // max_time
);
statsdb->execute(query);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid raw sprintf inserts for digest rows.

digest_text and other fields can contain quotes/JSON, which will break the SQL and can enable injection into the stats DB. Use a prepared statement with bound parameters (as you already do in other stats functions).

🛠️ Suggested approach (prepared statement)
-    char* a = (char*)"INSERT INTO stats_mcp_query_digest VALUES (\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\")";
-    for (...) {
-        ...
-        sprintf(query, a, ...);
-        statsdb->execute(query);
-        free(query);
-    }
+    const char* sql = reset
+        ? "INSERT INTO stats_mcp_query_digest_reset VALUES (?1,?2,?3,?4,?5,?6,?7,?8,?9,?10)"
+        : "INSERT INTO stats_mcp_query_digest VALUES (?1,?2,?3,?4,?5,?6,?7,?8,?9,?10)";
+    sqlite3_stmt* stmt = nullptr;
+    int rc = statsdb->prepare_v2(sql, &stmt);
+    ASSERT_SQLITE_OK(rc, statsdb);
+    for (SQLite3_row* r : resultset->rows) {
+        rc = (*proxy_sqlite3_bind_text)(stmt, 1, r->fields[0], -1, SQLITE_TRANSIENT); ASSERT_SQLITE_OK(rc, statsdb);
+        rc = (*proxy_sqlite3_bind_text)(stmt, 2, r->fields[1], -1, SQLITE_TRANSIENT); ASSERT_SQLITE_OK(rc, statsdb);
+        rc = (*proxy_sqlite3_bind_text)(stmt, 3, r->fields[2], -1, SQLITE_TRANSIENT); ASSERT_SQLITE_OK(rc, statsdb);
+        rc = (*proxy_sqlite3_bind_text)(stmt, 4, r->fields[3], -1, SQLITE_TRANSIENT); ASSERT_SQLITE_OK(rc, statsdb);
+        rc = (*proxy_sqlite3_bind_text)(stmt, 5, r->fields[4], -1, SQLITE_TRANSIENT); ASSERT_SQLITE_OK(rc, statsdb);
+        rc = (*proxy_sqlite3_bind_text)(stmt, 6, r->fields[5], -1, SQLITE_TRANSIENT); ASSERT_SQLITE_OK(rc, statsdb);
+        rc = (*proxy_sqlite3_bind_text)(stmt, 7, r->fields[6], -1, SQLITE_TRANSIENT); ASSERT_SQLITE_OK(rc, statsdb);
+        rc = (*proxy_sqlite3_bind_text)(stmt, 8, r->fields[7], -1, SQLITE_TRANSIENT); ASSERT_SQLITE_OK(rc, statsdb);
+        rc = (*proxy_sqlite3_bind_text)(stmt, 9, r->fields[8], -1, SQLITE_TRANSIENT); ASSERT_SQLITE_OK(rc, statsdb);
+        rc = (*proxy_sqlite3_bind_text)(stmt,10, r->fields[9], -1, SQLITE_TRANSIENT); ASSERT_SQLITE_OK(rc, statsdb);
+        SAFE_SQLITE3_STEP2(stmt);
+        rc = (*proxy_sqlite3_clear_bindings)(stmt); ASSERT_SQLITE_OK(rc, statsdb);
+        rc = (*proxy_sqlite3_reset)(stmt); ASSERT_SQLITE_OK(rc, statsdb);
+    }
+    (*proxy_sqlite3_finalize)(stmt);
🤖 Prompt for AI Agents
In `@lib/ProxySQL_Admin_Stats.cpp` around lines 2605 - 2625, The current loop
builds SQL via sprintf into query (using template a) which allows quotes/JSON in
digest_text (and other r->fields[]) to break SQL or permit injection; replace
this with a prepared INSERT into stats_mcp_query_digest and bind the 10 values
from r->fields[0]..r->fields[9] for each SQLite3_row instead of sprintf/malloc.
Locate the loop using SQLite3_row* r and statsdb->execute and change it to
prepare a parameterized statement once (e.g., "INSERT INTO
stats_mcp_query_digest VALUES (?,?,?,?,?,?,?,?,?,?)"), bind each r->fields[i]
per row, execute/reset the statement for the next row, and remove the
sprintf/malloc usage to eliminate injection and quoting issues.

Comment on lines +7789 to +7793
if (success == false) {
if (resultset) {
free(resultset);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use delete for SQLite3_result cleanup.
execute_statement returns a C++ object; freeing it with free is undefined behavior.

🐛 Proposed fix
-    if (resultset) {
-        free(resultset);
-    }
+    if (resultset) {
+        delete resultset;
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (success == false) {
if (resultset) {
free(resultset);
}
}
if (success == false) {
if (resultset) {
delete resultset;
}
}
🤖 Prompt for AI Agents
In `@lib/ProxySQL_Admin.cpp` around lines 7789 - 7793, The cleanup code is calling
free(resultset) on an object returned by execute_statement (an instance of
SQLite3_result), which is undefined behavior; replace that call with the
appropriate C++ delete (i.e., delete resultset) and ensure any other cleanup
paths that free resultset are updated similarly so SQLite3_result is destroyed
properly (look for the resultset variable and usages around execute_statement
and the success==false branch in ProxySQL_Admin.cpp).

- Add special build rule in Makefile for anomaly_detection-t that includes:
  - $(OBJ) for global variables (GloVars, GloGATH)
  - -Wl,--allow-multiple-definition to allow test's main() to override ProxySQL's
  - ClickHouse client libraries (libclickhouse-cpp-lib.a, libzstdstatic.a, liblz4.a)
  - SQLite rembed library (libsqlite_rembed.a)
  - -lscram -lusual for PostgreSQL SCRAM support
- Add stub function SQLite3_Server_session_handler required by SQLite3_Server.cpp

Resolves compilation errors for anomaly_detection-t test.
- Add #include <string> for C++ std::string support
- Add #include <cmath> for sqrt() function
- Change format %lld to %ld for chrono duration types (long int, not long long)

Resolves compilation errors for vector_db_performance-t test.
- Add #include <string> for C++ std::string support

Resolves compilation errors for ai_llm_retry_scenarios-t test.
This commit addresses valid concerns raised by coding agents (Gemini, Copilot, CoderabbitAI):

1. Fix stats_mcp_query_digest naming conflict (ProxySQL_Admin.cpp):
   - Made reset and non-reset paths mutually exclusive using else block
   - Prevents both flags from being true, matching MySQL pattern
   - Ensures reset takes precedence over non-reset

2. Fix INSERT OR REPLACE sync issue (Admin_Handler.cpp):
   - Added DELETE before INSERT OR REPLACE in LOAD/SAVE MCP QUERY RULES
   - Prevents stale rules from persisting when syncing disk <-> memory
   - Ensures deleted source rows are also removed from target

3. Fix integer division truncation for timeout (Query_Tool_Handler.cpp):
   - Changed timeout_ms/1000 to (timeout_ms+999)/1000 for ceiling division
   - Ensures sub-second timeouts (e.g., 500ms) become at least 1 second
   - Prevents zero-second timeouts from causing unexpected behavior

4. Remove confusing comment (Discovery_Schema.cpp):
   - Simplified column count comment to be clear and accurate

Note: The re_modifiers parsing code already correctly handles VARCHAR "CASELESS"
to int conversion (lines 2414-2425), so that review comment was already addressed.
@renecannao
Copy link
Author

Thank you to the three coding agents for the thorough code review!

Summary of Fixes

I've addressed the valid concerns raised by @gemini-code-assist[bot], @copilot, and @coderabbitai[bot] in commit :

Issues Fixed:

  1. stats_mcp_query_digest naming conflict (ProxySQL_Admin.cpp)

    • Made reset and non-reset paths mutually exclusive using block
    • Matches the MySQL pattern for consistency
  2. INSERT OR REPLACE sync issue (Admin_Handler.cpp)

    • Added before in LOAD/SAVE MCP QUERY RULES
    • Prevents stale rules from persisting when syncing disk ↔ memory
  3. Integer division truncation for timeout (Query_Tool_Handler.cpp)

    • Changed to ceiling division:
    • Ensures sub-second timeouts (e.g., 500ms) become at least 1 second
  4. Confusing comment about column count (Discovery_Schema.cpp)

    • Simplified the comment for clarity

Already Correct:

  • re_modifiers parsing - The code already properly handles VARCHAR "CASELESS" to int conversion (lines 2414-2425 in Discovery_Schema.cpp)

Items Requiring Further Discussion (deferred to future work):

Some excellent suggestions were noted for future enhancements:

  • Using instead of raw pointers for automatic memory management
  • Adding NULL checks after malloc/strdup allocations
  • Extracting cleanup logic into helper functions to reduce duplication
  • Completing the implementation (currently has TODO)

These are good ideas for follow-up PRs to improve code quality and robustness.

Thanks again for the thoughtful reviews! 🙏

Copy link

Copilot AI commented Jan 21, 2026

@renecannao I've opened a new pull request, #24, to work on those changes. Once the pull request is ready, I'll request review from you.

@gemini-code-assist
Copy link

Hi @renecannao,

Thank you for the detailed summary of the fixes and for addressing the feedback from the review! It's great to see these improvements implemented.

I particularly appreciate the fixes for:

  • stats_mcp_query_digest naming conflict: Ensuring mutual exclusivity for reset and non-reset paths is crucial for data integrity and consistency with existing patterns.
  • Integer division truncation for timeout: The change to ceiling division for timeout_ms correctly handles sub-second timeouts, preventing unexpected behavior.
  • Confusing comment about column count: Clarifying comments always improves maintainability.

I also agree that the suggestions regarding std::unique_ptr, NULL checks for malloc/strdup, extracting cleanup logic, and completing TODO implementations are excellent ideas for follow-up PRs. Focusing on the core functionality first and then enhancing robustness and code quality in subsequent iterations is a sound approach.

Thanks again for your thorough work on this pull request! 🙏

@coderabbitai
Copy link

coderabbitai bot commented Jan 21, 2026

Tip

For best results, initiate chat on the files or code changes.

@renecannao, thank you for the detailed summary and for addressing the review feedback so promptly! 🎉

Your fixes look solid:

  • The stats_mcp_query_digest mutual exclusion with if/else is a clean solution that matches existing patterns
  • Adding DELETE before INSERT OR REPLACE prevents stale data during sync operations
  • The ceiling division for timeout ensures sub-second timeouts don't get truncated to zero
  • Clarifying the column count comment improves readability

Great to hear the re_modifiers parsing was already handling VARCHAR-to-int conversion correctly!

Deferring the enhancement suggestions (smart pointers, NULL checks, helper extraction, TODO completion) to future PRs is a pragmatic approach that keeps this PR focused. Those are excellent candidates for follow-up quality improvements.

Thanks for being so responsive to the feedback! If you'd like me to review the new commit with these fixes, just let me know. 🙏

@renecannao renecannao merged commit af28598 into v3.1-MCP2 Jan 21, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants