feat: Add MCP query rules and digest statistics [WIP] #19

renecannao · 2026-01-19T23:30:58Z

Summary

Implement firewall functionality and query digest statistics for MCP (Model Context Protocol) queries in ProxySQL, mirroring the existing MySQL query rules and statistics infrastructure.

Key Design

Action is inferred from rule properties (like MySQL/PostgreSQL query rules):

error_msg != NULL → block
replace_pattern != NULL → rewrite
timeout_ms > 0 → timeout
OK_msg != NULL → return OK message
otherwise → allow

Actions are NOT mutually exclusive - a single rule can perform multiple actions simultaneously.

Changes

Add mcp_query_rules table for firewall rules (17 columns)
Add runtime_mcp_query_rules table (same 17 columns, shows in-memory state)
Add stats_mcp_query_rules table with rule_id and hits
Add stats_mcp_query_digest table for query statistics (10 columns)
Implement rule evaluation with flagIN/flagOUT chaining
Implement query digest computation using SpookyHash
Implement hit counters maintained in-memory only
Add LOAD/SAVE commands for MCP query rules

Tables Added

mcp_query_rules - Config table for defining rules
runtime_mcp_query_rules - Runtime view of active rules
stats_mcp_query_rules - Hit counters per rule
stats_mcp_query_digest - Query digest statistics
stats_mcp_query_digest_reset - Reset-on-read stats

Commands Added

LOAD MCP QUERY RULES FROM DISK
LOAD MCP QUERY RULES TO MEMORY
LOAD MCP QUERY RULES TO RUNTIME
SAVE MCP QUERY RULES TO DISK
SAVE MCP QUERY RULES TO MEMORY
SAVE MCP QUERY RULES FROM RUNTIME

Documentation

This PR also includes comprehensive documentation for:

Static harvester functions
Query tool handler helpers
Schema isolation in catalog
MCP query rules and digest functions

Test plan

Create a test rule
Verify LOAD/SAVE commands work
Verify runtime_mcp_query_rules shows hits
Verify stats_mcp_query_rules tracks hits
Verify stats_mcp_query_digest tracks queries
Test rule evaluation (block, rewrite, timeout)
Test flagIN/flagOUT chaining

… schema - Add ADMIN_SQLITE_TABLE_RUNTIME_MCP_QUERY_RULES schema (17 columns, same as mcp_query_rules) - Fix STATS_SQLITE_TABLE_MCP_QUERY_RULES to only have rule_id and hits columns - Add runtime_mcp_query_rules detection and refresh in ProxySQL_Admin - Implement save_mcp_query_rules_from_runtime(bool _runtime) for both config and runtime tables - Update get_mcp_query_rules() to return 17 columns (no hits) - get_stats_mcp_query_rules() returns 2 columns (rule_id, hits) Mirrors the MySQL query rules pattern: - mcp_query_rules: config table (17 cols) - runtime_mcp_query_rules: runtime state (17 cols) - stats_mcp_query_rules: hit counters (2 cols)

- Add separate MCP QUERY RULES command block in Admin_Handler - Fix string length comparison (21 chars for "SAVE/LOAD MCP QUERY RULES ") - Add handlers for: - LOAD MCP QUERY RULES TO RUNTIME - SAVE MCP QUERY RULES TO DISK - SAVE MCP QUERY RULES TO MEMORY / FROM RUNTIME - Register mcp_query_rules in disk database (tables_defs_config) Previously MCP commands were incorrectly nested inside MYSQL/PGSQL block and could not be reached. Now they have their own conditional block.

- Add LOAD MCP QUERY RULES FROM DISK command - Add LOAD MCP QUERY RULES TO MEMORY command - Both commands copy rules from disk.mcp_query_rules to main.mcp_query_rules This completes the full set of MCP query rules LOAD/SAVE commands, matching the MySQL query rules pattern.

Change function signature from stats___mcp_query_rules(bool reset) to stats___mcp_query_rules() to match MySQL query rules pattern. The reset parameter was never used in the function body and MySQL's stats___mysql_query_rules() has no reset parameter.

Previously re_modifiers was hardcoded to 1 (CASELESS), ignoring the value stored in the database. Now properly reads from row->fields[7].

Add detailed function-level documentation to all MCP query rules, query digest, static harvester, and catalog components. Static_Harvester.cpp: - Document all 18+ harvest functions (schemas, objects, columns, indexes, FKs, views) - Document lifecycle methods (init, close, connect, disconnect) - Document helper methods (is_time_type, is_id_like_name) - Document run management (start_run, finish_run, run_full_harvest) - Document statistics methods (get_harvest_stats) Query_Tool_Handler.cpp: - Document JSON helper functions (json_string, json_int, json_double) - Document digest tracking section with flow explanation MySQL_Catalog.cpp: - Document schema isolation architecture - Document CRUD operations (upsert, get, search, list, remove, merge) Discovery_Schema.cpp: - Document MCP query rules evaluation (evaluate_mcp_query_rules) - Document digest functions (compute_mcp_digest, fingerprint_mcp_args) - Document update/get functions for rules and digests ProxySQL_Admin_Stats.cpp: - Document stats collection functions ProxySQL_Admin.cpp: - Document load/save functions for query rules Admin_Handler.cpp: - Document MCP query rules command handlers include/ProxySQL_Admin_Tables_Definitions.h: - Add comments explaining table purposes

coderabbitai · 2026-01-19T23:31:03Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds a new MCP subsystem: in-memory rule storage and evaluation, digest/statistics tracking, admin/runtime table definitions and commands, catalog lifecycle and schema-aware catalog ops, plus a Static_Harvester for MySQL metadata harvesting.

Changes

Cohort / File(s)	Summary
MCP Data Structures & Declarations `include/Discovery_Schema.h`, `include/ProxySQL_Admin_Tables_Definitions.h`, `include/proxysql_admin.h`	Adds `MCP_Query_Rule`, `MCP_Query_Digest_Stats`, `MCP_Query_Processor_Output`; new private state (rule vector, rwlocks, digest umap, version); new public MCP APIs (load/evaluate/get/update/get_stats/compute/fingerprint); adds 5 SQLite table macros for MCP rules, runtime rules, and digest/rules stats; new includes.
MCP Core Implementation `lib/Discovery_Schema.cpp`, `lib/Query_Tool_Handler.cpp`	Implements rule lifecycle (load/compile rules), rule evaluation (multi-action: rewrite, block, timeout, OK, flags), digest computation (fingerprinting JSON args + SpookyHash), in-memory digest updates, and integrates rule evaluation & digest updates into query execution path.
Admin Integration & CLI Handlers `lib/Admin_Bootstrap.cpp`, `lib/Admin_Handler.cpp`, `lib/ProxySQL_Admin.cpp`, `lib/ProxySQL_Admin_Stats.cpp`	Registers MCP tables in admin/config/stats, adds CLI/admin handlers for LOAD/SAVE/TO/FROM RUNTIME, implements runtime load/save functions, implements stats export functions for digest and rule hit counters, and wires MCP into GenericRefreshStatistics.
Catalog Lifecycle & Schema Ops `lib/MySQL_Catalog.cpp`	Refactors catalog to explicit lifecycle (init/close), adds schema-scoped CRUD/search/list/upsert/merge/remove operations, creates schema indexes and FTS setup.
Metadata Harvester `lib/Static_Harvester.cpp`	Implements Static_Harvester: MySQL connection management, multi-stage harvest pipeline (schemas, objects, columns, indexes, FKs, views), quick profiles, FTS rebuild, run-scoped tracking and stats.

Sequence Diagram(s)

sequenceDiagram
    participant QTH as Query_Tool_Handler
    participant DS as Discovery_Schema
    participant DB as AdminDB
    participant QP as MCP_Processor

    QTH->>DS: evaluate_mcp_query_rules(tool_name, schemaname, arguments, original_query)
    activate DS
    DS->>QP: iterate active rules (regex match / filters)
    activate QP
    QP->>QP: evaluate match_pattern, username, schemaname, flags
    QP-->>DS: MCP_Query_Processor_Output (actions: rewrite/error/OK/timeout, flags)
    deactivate QP
    DS-->>QTH: MCP_Query_Processor_Output*
    deactivate DS

    alt OK_msg set
        QTH->>QTH: return success with OK_msg (early)
    else error_msg set
        QTH->>QTH: return error_msg (early)
    else new_query provided
        QTH->>QTH: execute rewritten query
        QTH->>DS: compute_mcp_digest(tool_name, arguments)
        DS-->>QTH: digest, digest_text
        QTH->>DS: update_mcp_query_digest(tool_name, run_id, digest, digest_text, duration, ts)
        DS->>DB: (optional) persist digest stats / later export via stats___mcp_query_digest
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

MCP (Model Context Protocol) Multi-Endpoint Architecture #8 — Overlaps MCP subsystem work and Query_Tool_Handler integration; likely related to MCP rule/runtime handling.

Possibly related PRs

MCP (Model Context Protocol) Multi-Endpoint Architecture #9 — Discovery_Schema MCP APIs here are used by that PR's MCP endpoints and handlers.
Comprehensive AI Features Implementation: NL2SQL, Anomaly Detection, and Vector Storage #13 — Both add MCP/AI workflow that consumes evaluate/compute digest APIs introduced here.
Add MCP stdio bridge and database discovery tools #10 — Shares catalog and MCP integration patterns and JSON-based processing introduced in this PR.

Poem

🐰 In burrows of bytes I hop and sing,

Rules and digests now take wing,
I stitch the schema, chase a hash,
Rewrite a query in a flash,
Hooray — the Mission Control bells ring!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: Add MCP query rules and digest statistics [WIP]' accurately describes the primary changes, which introduce MCP firewall functionality and query digest statistics tables and infrastructure.
Description check	✅ Passed	The description comprehensively relates to the changeset, explaining the MCP query rules implementation, table structures, design decisions, and test plan items that align with the code changes.
Docstring Coverage	✅ Passed	Docstring coverage is 82.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-01-19T23:31:27Z

Summary of Changes

Hello @renecannao, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly extends ProxySQL's capabilities by integrating advanced management and monitoring features for Model Context Protocol (MCP) queries. It introduces a flexible rule-based system to control MCP query behavior, allowing for dynamic modifications or restrictions. Concurrently, it provides deep insights into MCP query performance and usage patterns through detailed digest statistics, enhancing observability and operational control for MCP workloads.

Highlights

MCP Query Rules Implementation: Introduced a comprehensive system for MCP (Model Context Protocol) query rules, enabling firewall-like functionality such as blocking, rewriting, timing out, or returning immediate OK messages for MCP queries. This mirrors existing MySQL query rule capabilities.
Query Digest Statistics for MCP: Added functionality to collect and track query digest statistics for MCP queries. This includes metrics like execution count, total/min/max execution times, and first/last seen timestamps, using SpookyHash for query fingerprinting.
New Configuration and Statistics Tables: Created several new tables: 'mcp_query_rules' for persistent rule configuration, 'runtime_mcp_query_rules' for in-memory active rules, 'stats_mcp_query_rules' for rule hit counters, and 'stats_mcp_query_digest' along with 'stats_mcp_query_digest_reset' for query performance statistics.
Enhanced Management Commands: Implemented new 'LOAD' and 'SAVE' commands for MCP query rules, allowing administrators to manage rules between disk, memory, and runtime, similar to other ProxySQL configuration elements.
Extensive Documentation and Code Comments: Significantly enhanced code documentation with detailed Doxygen-style comments for static harvester functions, query tool handler helpers, schema isolation in the catalog, and all new MCP query rules and digest functions, improving code clarity and maintainability.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces significant new functionality by adding MCP query rules and digest statistics, mirroring existing infrastructure for MySQL/PostgreSQL. The overall design is consistent with the project's patterns. However, there are several areas for improvement. The new code heavily relies on manual memory management with raw pointers, which is error-prone and should be updated to use modern C++ idioms like smart pointers and std::string for better safety and maintainability. I've also identified some duplicated code, an incomplete feature implementation in the statistics handling, and a few inconsistencies between comments and code that should be addressed. My detailed comments highlight these points to help improve the robustness of this new feature.

gemini-code-assist · 2026-01-19T23:34:06Z

include/Discovery_Schema.h

+	MCP_Query_Processor_Output* evaluate_mcp_query_rules(
+		const std::string& tool_name,
+		const std::string& schemaname,
+		const nlohmann::json& arguments,
+		const std::string& original_query
+	);


The function evaluate_mcp_query_rules returns a raw pointer MCP_Query_Processor_Output*, and the documentation states that the caller is responsible for destroying it. This manual ownership transfer is error-prone and can easily lead to memory leaks if the caller forgets to delete the object.

To make the code safer and more idiomatic, you should use a smart pointer to manage the lifetime of the returned object automatically.

std::unique_ptr<MCP_Query_Processor_Output> evaluate_mcp_query_rules( const std::string& tool_name, const std::string& schemaname, const nlohmann::json& arguments, const std::string& original_query );

gemini-code-assist · 2026-01-19T23:34:06Z

lib/ProxySQL_Admin_Stats.cpp

+	// For now, we'll leave the table empty since MCP digest stats are stored in memory
+	// in the Discovery_Schema and would need to be accessed differently
+	// TODO: Implement proper access to Discovery_Schema digest statistics
+


This function is intended to populate the stats_mcp_query_digest table, but the implementation is incomplete. It currently only deletes existing data and contains a TODO comment, leaving the table empty. To complete this feature, you should fetch the digest statistics from the Discovery_Schema catalog (using get_mcp_query_digest) and insert the data into the stats table.

gemini-code-assist · 2026-01-19T23:34:06Z

include/Discovery_Schema.h

+struct MCP_Query_Rule {
+	int rule_id;
+	bool active;
+	char *username;
+	char *schemaname;
+	char *tool_name;
+	char *match_pattern;
+	bool negate_match_pattern;
+	int re_modifiers;         // bitmask: 1=CASELESS
+	int flagIN;
+	int flagOUT;
+	char *replace_pattern;
+	int timeout_ms;
+	char *error_msg;
+	char *ok_msg;
+	bool log;
+	bool apply;
+	char *comment;
+	uint64_t hits;            // in-memory only, not persisted to table
+	void* regex_engine;      // compiled regex (RE2)
+
+	MCP_Query_Rule() : rule_id(0), active(false), username(NULL), schemaname(NULL),
+	                   tool_name(NULL), match_pattern(NULL), negate_match_pattern(false),
+	                   re_modifiers(1), flagIN(0), flagOUT(0), replace_pattern(NULL),
+	                   timeout_ms(0), error_msg(NULL), ok_msg(NULL), log(false), apply(true),
+	                   comment(NULL), hits(0), regex_engine(NULL) {}
+};


The MCP_Query_Rule struct uses C-style char* for string members and a void* for the regex engine. This requires manual memory management (e.g., strdup/free, new/delete) which is complex and error-prone in C++. Using modern C++ features would make the code safer and more maintainable.

I recommend the following changes:

Replace char* members with std::string to automate memory management.

Replace void* regex_engine with std::unique_ptr<re2::RE2> to manage the lifetime of the regex object automatically and provide type safety.

These changes would eliminate the need for manual memory management in the Discovery_Schema destructor and load_mcp_query_rules function, reducing the risk of memory leaks or corruption.

gemini-code-assist · 2026-01-19T23:34:06Z

lib/Discovery_Schema.cpp

+	db->execute(
+		"CREATE TABLE IF NOT EXISTS mcp_query_rules ("
+		"  rule_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL ,"
+		"  active INT CHECK (active IN (0,1)) NOT NULL DEFAULT 0 ,"
+		"  tool_name VARCHAR ,"
+		"  run_id INT ,"
+		"  match_pattern VARCHAR ,"
+		"  negate_match_pattern INT CHECK (negate_match_pattern IN (0,1)) NOT NULL DEFAULT 0 ,"
+		"  re_modifiers VARCHAR DEFAULT 'CASELESS' ,"
+		"  flagIN INT NOT NULL DEFAULT 0 ,"
+		"  flagOUT INT CHECK (flagOUT >= 0) ,"
+		"  action VARCHAR CHECK (action IN ('allow','block','rewrite','timeout')) NOT NULL DEFAULT 'allow' ,"
+		"  replace_pattern VARCHAR ,"
+		"  timeout_ms INT CHECK (timeout_ms >= 0) ,"
+		"  error_msg VARCHAR ,"
+		"  OK_msg VARCHAR ,"
+		"  log INT CHECK (log IN (0,1)) ,"
+		"  apply INT CHECK (apply IN (0,1)) NOT NULL DEFAULT 1 ,"
+		"  comment VARCHAR ,"
+		"  hits INTEGER NOT NULL DEFAULT 0"
+		");"
+	);


The CREATE TABLE statement for mcp_query_rules within this function appears to be dead code. The schema defined here (with run_id and action columns) is inconsistent with the mcp_query_rules table defined in ProxySQL_Admin_Tables_Definitions.h and used by the admin interface. The application logic loads rules from the admin database into an in-memory structure and does not appear to use this table. This code should be removed to avoid confusion and reduce maintenance overhead.

gemini-code-assist · 2026-01-19T23:34:06Z

lib/Discovery_Schema.cpp

+	// Clear existing rules
+	for (auto rule : mcp_query_rules) {
+		if (rule->regex_engine) {
+			delete (re2::RE2*)rule->regex_engine;
+		}
+		free(rule->username);
+		free(rule->schemaname);
+		free(rule->tool_name);
+		free(rule->match_pattern);
+		free(rule->replace_pattern);
+		free(rule->error_msg);
+		free(rule->ok_msg);
+		free(rule->comment);
+		delete rule;
+	}
+	mcp_query_rules.clear();


This block of code for clearing existing rules is nearly identical to the cleanup logic in the Discovery_Schema destructor. Duplicating this complex memory management logic increases the risk of bugs and makes maintenance harder.

Consider extracting this cleanup logic into a private helper function (e.g., clear_mcp_query_rules()) that can be called from both the destructor and load_mcp_query_rules to avoid code duplication.

gemini-code-assist · 2026-01-19T23:34:06Z

lib/Discovery_Schema.cpp

+		if (rule->username) {
+			// For now, we don't have username in MCP context, skip if set
+			// TODO: Add username matching when available
+			continue;
+		}


The current logic for username matching will cause any rule with a username specified to be skipped, as the code immediately continues if rule->username is not null. The TODO comment indicates this might be intentional for now, but it effectively makes the username field in mcp_query_rules non-functional. This could be confusing for users trying to create rules based on usernames.

gemini-code-assist · 2026-01-19T23:34:06Z

lib/Discovery_Schema.cpp

+//   Input: {"filters": {"status": "active", "age": 25}}
+//   Output: {"filters":{"?":"?","?":"?"}}


The example in the documentation for fingerprint_mcp_args is incorrect and does not match the implementation. The implementation preserves object keys, while the example suggests keys are also replaced with ?.

The example:
Input: {"filters": {"status": "active", "age": 25}}
Output: {"filters":{"?":"?","?":"?"}}

Based on the implementation, the output should be:
Output: {"filters":{"status":"?","age":"?"}}

Please update the comment to accurately reflect the function's behavior.

gemini-code-assist · 2026-01-19T23:34:06Z

lib/ProxySQL_Admin_Stats.cpp

+	char* a = (char*)"INSERT INTO stats_mcp_query_rules VALUES (\"%s\",\"%s\")";
+	for (std::vector<SQLite3_row*>::iterator it = resultset->rows.begin(); it != resultset->rows.end(); ++it) {
+		SQLite3_row* r = *it;
+		int arg_len = 0;
+		for (int i = 0; i < 2; i++) {
+			arg_len += strlen(r->fields[i]);
+		}
+		char* query = (char*)malloc(strlen(a) + arg_len + 32);
+		sprintf(query, a, r->fields[0], r->fields[1]);
+		statsdb->execute(query);
+		free(query);
+	}


Using sprintf to construct SQL queries is generally unsafe and can lead to SQL injection vulnerabilities, even if the inputs seem safe. It's better to use prepared statements to handle parameter binding securely.

While in this specific case the inputs are generated from integers, adopting prepared statements as a standard practice will make the code more robust and secure against future changes.

renecannao · 2026-01-20T09:02:16Z

@coderabbitai review

coderabbitai · 2026-01-20T09:02:22Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copilot

Pull request overview

This pull request implements firewall functionality and query digest statistics for MCP (Model Context Protocol) queries in ProxySQL, mirroring the existing MySQL query rules and statistics infrastructure.

Changes:

Adds MCP query rules tables (mcp_query_rules, runtime_mcp_query_rules) with 17 columns for defining firewall rules
Implements query digest computation and statistics tracking (stats_mcp_query_digest, stats_mcp_query_rules)
Adds LOAD/SAVE commands for MCP query rules following ProxySQL patterns
Includes comprehensive documentation for Static Harvester, Query Tool Handler, and catalog isolation features

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 17 comments.

Show a summary per file

File	Description
lib/Static_Harvester.cpp	Adds comprehensive documentation for all static harvester functions and workflow
lib/Query_Tool_Handler.cpp	Implements MCP query rules evaluation and digest tracking integration into query execution flow
lib/ProxySQL_Admin_Stats.cpp	Adds stats collection functions for MCP query digest and rules (incomplete implementation)
lib/ProxySQL_Admin.cpp	Implements LOAD/SAVE commands for MCP query rules with proper escaping
lib/MySQL_Catalog.cpp	Documents schema isolation for multi-tenancy support
lib/Discovery_Schema.cpp	Core implementation of MCP query rules evaluation, digest computation, and statistics (has critical schema issues)
lib/Admin_Handler.cpp	Adds command parsing for MCP query rules LOAD/SAVE operations
lib/Admin_Bootstrap.cpp	Registers MCP query rules and stats tables in admin and config databases
include/proxysql_admin.h	Declares MCP query rules load/save and stats functions
include/ProxySQL_Admin_Tables_Definitions.h	Defines table schemas for MCP query rules and digest statistics
include/Discovery_Schema.h	Declares MCP query rule structures, processor output, and digest functions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-20T09:10:05Z

lib/Discovery_Schema.cpp

+SQLite3_result* Discovery_Schema::get_mcp_query_digest(bool reset) {
+	SQLite3_result* result = new SQLite3_result();
+
+	// Define columns (10 columns, not 11 - digest_text was duplicated)


The comment at line 2839 states "10 columns, not 11 - digest_text was duplicated", but counting the add_column_definition calls from lines 2840-2849, there are exactly 10 columns as expected. The comment is confusing because it suggests there was a duplication issue that was fixed, but this should be clarified or removed if it's no longer relevant.

Suggested change

// Define columns (10 columns, not 11 - digest_text was duplicated)

// Define columns for MCP query digest statistics

Copilot · 2026-01-20T09:10:05Z

lib/ProxySQL_Admin_Stats.cpp

+void ProxySQL_Admin::stats___mcp_query_digest(bool reset) {
+	if (!GloMCPH) return;
+	Query_Tool_Handler* qth = GloMCPH->query_tool_handler;
+	if (!qth) return;
+
+	// Get the discovery schema catalog
+	// Note: This is a simplified implementation that queries the catalog database
+	// In a full implementation, we would access the Discovery_Schema directly
+	statsdb->execute("BEGIN");
+
+	if (reset) {
+		statsdb->execute("DELETE FROM stats_mcp_query_digest_reset");
+	} else {
+		statsdb->execute("DELETE FROM stats_mcp_query_digest");
+	}
+
+	// For now, we'll leave the table empty since MCP digest stats are stored in memory
+	// in the Discovery_Schema and would need to be accessed differently
+	// TODO: Implement proper access to Discovery_Schema digest statistics
+
+	statsdb->execute("COMMIT");
+}


The function signature and documentation describe a parameter 'reset' for stats___mcp_query_digest, but the implementation is incomplete. The TODO comment at line 2599 indicates that the digest statistics access needs proper implementation. The function currently only deletes rows from the table but doesn't populate it with actual data from the Discovery_Schema. This will result in empty stats tables when queried.

Copilot · 2026-01-20T09:10:05Z

lib/ProxySQL_Admin_Stats.cpp

+	char* a = (char*)"INSERT INTO stats_mcp_query_rules VALUES (\"%s\",\"%s\")";
+	for (std::vector<SQLite3_row*>::iterator it = resultset->rows.begin(); it != resultset->rows.end(); ++it) {
+		SQLite3_row* r = *it;
+		int arg_len = 0;
+		for (int i = 0; i < 2; i++) {
+			arg_len += strlen(r->fields[i]);
+		}
+		char* query = (char*)malloc(strlen(a) + arg_len + 32);
+		sprintf(query, a, r->fields[0], r->fields[1]);
+		statsdb->execute(query);
+		free(query);
+	}


Memory leak potential: The result set returned from get_stats_mcp_query_rules is deleted at line 2647, but if the function returns early at lines 2582, 2584, or 2629, the resultset would not be cleaned up. However, reviewing the code more carefully, the resultset is only created inside the get_stats_mcp_query_rules call at line 2628, so early returns before that are safe. But there's still a potential issue - if any of the malloc/sprintf/execute operations fail between lines 2634-2645, the function should handle errors gracefully.

Copilot · 2026-01-20T09:10:06Z

lib/Discovery_Schema.cpp

+		rule->tool_name = row->fields[4] ? strdup(row->fields[4]) : NULL;  // tool_name
+		rule->match_pattern = row->fields[5] ? strdup(row->fields[5]) : NULL;  // match_pattern
+		rule->negate_match_pattern = row->fields[6] ? atoi(row->fields[6]) != 0 : false;  // negate_match_pattern
+		rule->re_modifiers = row->fields[7] ? atoi(row->fields[7]) : 1;  // default CASELESS


Type mismatch for re_modifiers: The table definition stores re_modifiers as VARCHAR DEFAULT 'CASELESS' (line 342 in ProxySQL_Admin_Tables_Definitions.h and line 359 in Discovery_Schema.cpp), but the loading code at line 2404 attempts to parse it as an integer using atoi(). This will fail - atoi('CASELESS') returns 0, not the expected bitmask value. The code expects re_modifiers to be an integer bitmask (1=CASELESS per line 2420), but the schema defines it as a string. Either change the schema to INT or update the loading logic to parse string values like 'CASELESS' to the corresponding bitmask.

Copilot · 2026-01-20T09:10:06Z

lib/Query_Tool_Handler.cpp

+			MCP_Query_Processor_Output* qpo = catalog->evaluate_mcp_query_rules(
+				tool_name,
+				schema,
+				arguments,
+				sql
+			);
+
+			// Check for OK_msg (return success without executing)
+			if (qpo->OK_msg) {
+				unsigned long long duration = monotonic_time() - start_time;
+				track_tool_invocation(this, tool_name, schema, duration);
+				catalog->log_query_tool_call(tool_name, schema, 0, start_time, duration, "OK message from query rule");
+				result = create_success_response(qpo->OK_msg);
+				delete qpo;
+				return result;
+			}
+
+			// Check for error_msg (block the query)
+			if (qpo->error_msg) {
+				unsigned long long duration = monotonic_time() - start_time;
+				track_tool_invocation(this, tool_name, schema, duration);
+				catalog->log_query_tool_call(tool_name, schema, 0, start_time, duration, "Blocked by query rule");
+				result = create_error_response(qpo->error_msg);
+				delete qpo;
+				return result;
+			}
+
+			// Apply rewritten query if provided
+			if (qpo->new_query) {
+				sql = *qpo->new_query;
+			}
+
+			// Apply timeout if provided
+			if (qpo->timeout_ms > 0) {
+				timeout_sec = qpo->timeout_ms / 1000;
+			}
+
+			// Apply log flag if set
+			if (qpo->log == 1) {
+				// TODO: Implement query logging if needed
+			}
+
+			delete qpo;


Memory leak: The qpo object allocated at line 1602 is deleted at lines 1615, 1625, and 1644, but if execution continues past line 1644 and an exception is thrown in the query execution code (lines 1647-1693), the qpo object will not be deleted, causing a memory leak. Consider using RAII (std::unique_ptr) or ensuring cleanup in all code paths.

Copilot · 2026-01-20T09:10:09Z

lib/ProxySQL_Admin.cpp

+	if (success == false) {
+		if (resultset) {
+			free(resultset);
+		}
+	}


Potential memory leak in error path: If resultset allocation succeeds but the execute_statement fails (returning an error), the function returns NULL at line 7789 without freeing the resultset. The condition at line 7789-7792 attempts to free the resultset only when success is false, but the resultset pointer may already be allocated by execute_statement even on error. The cleanup logic should verify whether resultset was allocated before freeing it.

Copilot · 2026-01-20T09:10:09Z

lib/Discovery_Schema.cpp

+	// Periodically persist to SQLite (every 100 updates or so)
+	static thread_local unsigned int update_count = 0;
+	if (++update_count % 100 == 0) {
+		// TODO: Implement batch persistence
+	}


Thread safety issue: The static thread_local variable update_count at line 2801 is incremented without any synchronization. While thread_local ensures each thread has its own copy, if the same thread processes multiple queries concurrently (which is unlikely but possible in some threading models), this could lead to race conditions. However, since the persistence is only a TODO at line 2803, this is not currently a functional issue but should be addressed when implementing batch persistence.

Copilot · 2026-01-20T09:10:09Z

lib/Query_Tool_Handler.cpp

+			// Apply timeout if provided
+			if (qpo->timeout_ms > 0) {
+				timeout_sec = qpo->timeout_ms / 1000;
+			}


The timeout_sec variable is declared but never used after being set at line 1636. The variable appears to be intended for future query timeout enforcement but is not actually applied to the query execution. This could mislead developers into thinking timeout enforcement is active when it's not.

Suggested change

// Apply timeout if provided

if (qpo->timeout_ms > 0) {

timeout_sec = qpo->timeout_ms / 1000;

}

// Note: qpo->timeout_ms is currently not enforced. Timeout handling

// should be implemented where queries are actually executed.

Copilot · 2026-01-20T09:10:09Z

lib/Discovery_Schema.cpp

+//   Output: {"sql":"?","timeout":"?"}
+//
+//   Input: {"filters": {"status": "active", "age": 25}}
+//   Output: {"filters":{"?":"?","?":"?"}}


The comment at line 2971 shows an example output {"filters":{"?":"?","?":"?"}} which incorrectly suggests that object keys are replaced with "?". However, the code at line 2988 preserves the key using it.key(), so the correct output should be {"filters":{"status":"?","age":"?"}}. The documentation should be corrected to accurately reflect the fingerprinting behavior.

Suggested change

// Output: {"filters":{"?":"?","?":"?"}}

// Output: {"filters":{"status":"?","age":"?"}}

Copilot · 2026-01-20T09:10:10Z

lib/Discovery_Schema.cpp

+
+		// Check schemaname match
+		if (rule->schemaname) {
+			if (!schemaname.empty() && strcmp(rule->schemaname, schemaname.c_str()) != 0) {


The condition at line 2511 checks if schemaname is not empty before comparing with rule->schemaname, but it doesn't handle the case where rule->schemaname is NULL and schemaname is empty. If rule->schemaname is NULL, it should match an empty schemaname parameter (meaning "match all schemas"). The current logic will skip the rule if schemaname is empty but rule->schemaname is not NULL, which may not be the intended behavior. Consider: if rule->schemaname is NULL, it should match any schema; if rule->schemaname is not NULL and schemaname is not empty, only match if they're equal.

Suggested change

if (!schemaname.empty() && strcmp(rule->schemaname, schemaname.c_str()) != 0) {

// Rules with an explicit schemaname only apply when a matching

// non-empty schemaname is provided.

if (schemaname.empty() || strcmp(rule->schemaname, schemaname.c_str()) != 0) {

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)

lib/MySQL_Catalog.cpp (3)

278-301: SQL injection vulnerability: user input concatenated directly into query.

The schema, kind, tags, and query parameters are concatenated directly into the SQL string without sanitization or parameterized binding. An attacker could inject malicious SQL.

🔒 Recommended fix using parameterized queries

 std::string MySQL_Catalog::search(
 	const std::string& schema,
 	const std::string& query,
 	const std::string& kind,
 	const std::string& tags,
 	int limit,
 	int offset
 ) {
-	std::ostringstream sql;
-	sql << "SELECT schema, kind, key, document, tags ,  links FROM catalog WHERE 1=1";
-
-	// Add schema filter
-	if (!schema.empty()) {
-		sql << " AND schema = '" << schema << "'";
-	}
-
-	// Add kind filter
-	if (!kind.empty()) {
-		sql << " AND kind = '" << kind << "'";
-	}
-
-	// Add tags filter
-	if (!tags.empty()) {
-		sql << " AND tags LIKE '%" << tags << "%'";
-	}
-
-	// Add search query
-	if (!query.empty()) {
-		sql << " AND (key LIKE '%" << query << "%' "
-		     << "OR document LIKE '%" << query << "%' "
-		     << "OR tags LIKE '%" << query << "%')";
-	}
+	// Use parameterized query to prevent SQL injection
+	sqlite3_stmt* stmt = NULL;
+	std::string sql = "SELECT schema, kind, key, document, tags, links FROM catalog WHERE 1=1";
+	
+	std::vector<std::string> bindings;
+	if (!schema.empty()) {
+		sql += " AND schema = ?";
+		bindings.push_back(schema);
+	}
+	if (!kind.empty()) {
+		sql += " AND kind = ?";
+		bindings.push_back(kind);
+	}
+	// ... continue with parameterized bindings for tags and query

368-390: SQL injection vulnerability in list() method.

Same issue as search() — schema and kind are concatenated directly into SQL without sanitization.

505-511: SQL injection vulnerability in remove() method.

The schema, kind, and key parameters are directly concatenated into the DELETE statement.

🔒 Recommended fix

 int MySQL_Catalog::remove(
 	const std::string& schema,
 	const std::string& kind,
 	const std::string& key
 ) {
-	std::ostringstream sql;
-	sql << "DELETE FROM catalog WHERE 1=1";
-
-	if (!schema.empty()) {
-		sql << " AND schema = '" << schema << "'";
-	}
-	sql << " AND kind = '" << kind << "' AND key = '" << key << "'";
-
-	if (!db->execute(sql.str().c_str())) {
+	sqlite3_stmt* stmt = NULL;
+	const char* delete_sql = schema.empty() 
+		? "DELETE FROM catalog WHERE kind = ?1 AND key = ?2"
+		: "DELETE FROM catalog WHERE schema = ?1 AND kind = ?2 AND key = ?3";
+	
+	int rc = db->prepare_v2(delete_sql, &stmt);
+	if (rc != SQLITE_OK) {
+		proxy_error("Failed to prepare catalog remove: %d\n", rc);
+		return -1;
+	}
+	
+	int idx = 1;
+	if (!schema.empty()) {
+		(*proxy_sqlite3_bind_text)(stmt, idx++, schema.c_str(), -1, SQLITE_TRANSIENT);
+	}
+	(*proxy_sqlite3_bind_text)(stmt, idx++, kind.c_str(), -1, SQLITE_TRANSIENT);
+	(*proxy_sqlite3_bind_text)(stmt, idx++, key.c_str(), -1, SQLITE_TRANSIENT);
+	
+	SAFE_SQLITE3_STEP2(stmt);
+	(*proxy_sqlite3_finalize)(stmt);
+	
+	return 0;

lib/Static_Harvester.cpp (3)

398-423: SQL injection in fetch_schemas; use parameterized query.

The filter parameter is concatenated directly into the SQL query. While this may only be called internally, using parameterized queries is a best practice for defense-in-depth.

🔒 Suggested fix

 std::vector<Static_Harvester::SchemaRow> Static_Harvester::fetch_schemas(const std::string& filter) {
 	std::vector<SchemaRow> schemas;
 
-	std::ostringstream sql;
-	sql << "SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME "
-	    << "FROM information_schema.SCHEMATA";
-
-	if (!filter.empty()) {
-		sql << " WHERE SCHEMA_NAME = '" << filter << "'";
-	}
-
-	sql << " ORDER BY SCHEMA_NAME;";
+	std::string sql = "SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME "
+	    "FROM information_schema.SCHEMATA";
+	
+	if (!filter.empty()) {
+		// Escape single quotes in filter to prevent injection
+		std::string escaped_filter;
+		for (char c : filter) {
+			if (c == '\'') escaped_filter += "''";
+			else escaped_filter += c;
+		}
+		sql += " WHERE SCHEMA_NAME = '" + escaped_filter + "'";
+	}
+	
+	sql += " ORDER BY SCHEMA_NAME;";

Or better, use MySQL prepared statements via mysql_stmt_* APIs.

479-513: Same SQL injection pattern in fetch_tables_views.

The filter parameter is directly concatenated. Apply the same fix as fetch_schemas.

1036-1055: SQL injection in harvest_view_definitions via view_def.

Line 1045 inserts view_def directly into an UPDATE statement. View definitions from MySQL could contain single quotes or malicious content.

🔒 Use parameterized update

-		std::ostringstream update_sql;
-		update_sql << "UPDATE objects SET definition_sql = '" << view_def << "' "
-		           << "WHERE run_id = " << current_run_id
-		           << " AND schema_name = '" << schema_name << "'"
-		           << " AND object_name = '" << view_name << "'"
-		           << " AND object_type = 'view';";
-
-		catalog->get_db()->execute_statement(update_sql.str().c_str(), &error, &cols, &affected);
+		sqlite3_stmt* stmt = NULL;
+		const char* update_sql = "UPDATE objects SET definition_sql = ?1 "
+			"WHERE run_id = ?2 AND schema_name = ?3 AND object_name = ?4 AND object_type = 'view'";
+		
+		int rc = catalog->get_db()->prepare_v2(update_sql, &stmt);
+		if (rc == SQLITE_OK) {
+			(*proxy_sqlite3_bind_text)(stmt, 1, view_def.c_str(), -1, SQLITE_TRANSIENT);
+			(*proxy_sqlite3_bind_int)(stmt, 2, current_run_id);
+			(*proxy_sqlite3_bind_text)(stmt, 3, schema_name.c_str(), -1, SQLITE_TRANSIENT);
+			(*proxy_sqlite3_bind_text)(stmt, 4, view_name.c_str(), -1, SQLITE_TRANSIENT);
+			SAFE_SQLITE3_STEP2(stmt);
+			(*proxy_sqlite3_finalize)(stmt);
+		}

lib/Discovery_Schema.cpp (2)

1256-1267: SQL injection in fts_search via query parameter.

The query, object_type, and schema_name parameters are directly concatenated into the FTS MATCH clause and WHERE conditions.
🔒 Escape or validate FTS query input

FTS5 MATCH queries require special handling. At minimum, escape single quotes:
-	sql << "SELECT object_key, schema_name, object_name, object_type, tags ,  bm25(fts_objects) AS score "
-	    << "FROM fts_objects WHERE fts_objects MATCH '" << query << "'";
+	// Escape single quotes for FTS query
+	std::string escaped_query;
+	for (char c : query) {
+		if (c == '\'') escaped_query += "''";
+		else escaped_query += c;
+	}
+	sql << "SELECT object_key, schema_name, object_name, object_type, tags, bm25(fts_objects) AS score "
+	    << "FROM fts_objects WHERE fts_objects MATCH '" << escaped_query << "'";
2075-2087: SQL injection in fts_search_llm.

Same issue — the query parameter is directly concatenated into the MATCH clause.

🤖 Fix all issues with AI agents

In `@lib/Admin_Handler.cpp`:
- Around line 2362-2383: The current INSERT OR REPLACE SELECT statements in the
LOAD and SAVE branches (where *q is set via l_strdup to "INSERT OR REPLACE INTO
main.mcp_query_rules SELECT * FROM disk.mcp_query_rules" and "INSERT OR REPLACE
INTO disk.mcp_query_rules SELECT * FROM main.mcp_query_rules") do not remove
source-deleted rows and can resurrect stale rules; modify the SQL assigned to *q
in both branches so the target table is cleared before copying (e.g., issue a
DELETE FROM <target_table> or use a transactional sequence "DELETE FROM
<target_table>; INSERT INTO <target_table> SELECT ...") while keeping the same
surrounding logic (the blocks that test query_no_space_length and call l_free,
l_strdup, set *ql and return true) and ensure both the LOAD (target
main.mcp_query_rules) and SAVE (target disk.mcp_query_rules) cases are updated.

In `@lib/Discovery_Schema.cpp`:
- Around line 2655-2684: The code allocates pta with malloc and many strdup
calls (symbols: pta, malloc, strdup, result->add_row) without checking for NULL;
add checks after malloc and after each strdup (or use a helper that allocates
and duplicates safely) so that on allocation failure you free any
already-allocated entries, free pta, and avoid calling result->add_row (return
an error or handle gracefully). Specifically: verify pta != NULL immediately
after malloc, check each strdup result before storing it into pta (or wrap
duplication to return NULL on failure and propagate), on any allocation failure
free any non-NULL pta[i] and pta itself, and only call result->add_row when all
required allocations succeeded.
- Around line 2851-2895: The code has a TOCTOU race: it unlocks the read lock
(pthread_rwlock_rdlock / pthread_rwlock_unlock) then later acquires the write
lock (pthread_rwlock_wrlock) when reset is true, allowing concurrent
modification of mcp_digest_umap; change the locking so that if reset is true you
acquire the write lock before iterating (i.e., call pthread_rwlock_wrlock
instead of pthread_rwlock_rdlock at the start when reset is true) and hold that
lock for both building rows and clearing/deleting entries (use mcp_digest_umap,
MCP_Query_Digest_Stats, result->add_row references), otherwise keep the current
read-lock path—ensure you do not double-lock and that the unlock used matches
the lock acquired.

In `@lib/ProxySQL_Admin.cpp`:
- Around line 1358-1363: The bug is that stats_mcp_query_digest_reset contains
stats_mcp_query_digest so both flags get set; update the checks in the block
that reads query_no_space so the reset path is handled exclusively (mirror
MySQL/PGSQL): test for "stats_mcp_query_digest_reset" before testing for
"stats_mcp_query_digest" (or use a precise token/boundary match) so only
stats_mcp_query_digest_reset becomes true and refresh is set once; adjust the
logic around the stats_mcp_query_digest_reset and stats_mcp_query_digest flag
assignments to avoid double-triggering the reset and non-reset handling.

In `@lib/Query_Tool_Handler.cpp`:
- Around line 1635-1637: The conversion of qpo->timeout_ms to timeout_sec uses
integer division which truncates sub-second values to zero (e.g., 500ms => 0);
change the conversion logic in Query_Tool_Handler.cpp around the timeout
handling (variable timeout_sec and qpo->timeout_ms) to round up instead of
truncating so any non-zero millisecond timeout becomes at least 1 second (e.g.,
replace the direct /1000 integer division with a ceil-style conversion such as
(timeout_ms + 999)/1000 or use floating-point division plus ceil) and keep
existing behavior when timeout_ms is 0 or negative.

♻️ Duplicate comments (4)

lib/ProxySQL_Admin_Stats.cpp (2)

2581-2602: Implementation remains incomplete as previously noted.

The function deletes existing rows but never inserts new data, leaving the table permanently empty. The TODO comment at lines 2597-2599 acknowledges this. The digest statistics are stored in memory in the Discovery_Schema and need to be accessed via get_mcp_query_digest() to complete this implementation.

2634-2645: The sprintf pattern follows existing codebase conventions.

This implementation mirrors the pattern used in stats___mysql_query_rules() (lines 1385-1396) and stats___pgsql_query_rules() (lines 1407-1418). While prepared statements would be safer as previously noted, this is consistent with the established codebase style. The input data (rule_id and hits) comes from the catalog's internal integer counters, limiting injection risk in practice.

lib/Discovery_Schema.cpp (2)

2372-2387: Duplicate cleanup code — consider extracting helper.

This cleanup logic is nearly identical to the destructor code at lines 34-48. Extracting a private helper method (e.g., clear_mcp_query_rules()) would reduce duplication and maintenance burden.

2503-2507: Username matching effectively disabled.

The current logic skips any rule with a username set, making the username field non-functional. The TODO indicates this is intentional for now, but users may be confused when their username-based rules don't work.

🧹 Nitpick comments (7)

lib/Admin_Bootstrap.cpp (1)
911-914: Minor: Clarify the comment on line 914.

The comment "Reuse same schema for stats" is slightly misleading. The stats_mcp_query_rules table has a different, simpler schema (just rule_id and hits) compared to the full mcp_query_rules schema. Consider updating the comment to better reflect this.
📝 Suggested comment improvement
 	// MCP query digest stats
 	insert_into_tables_defs(tables_defs_stats,"stats_mcp_query_digest", STATS_SQLITE_TABLE_MCP_QUERY_DIGEST);
 	insert_into_tables_defs(tables_defs_stats,"stats_mcp_query_digest_reset", STATS_SQLITE_TABLE_MCP_QUERY_DIGEST_RESET);
-	insert_into_tables_defs(tables_defs_stats,"stats_mcp_query_rules", STATS_SQLITE_TABLE_MCP_QUERY_RULES); // Reuse same schema for stats
+	insert_into_tables_defs(tables_defs_stats,"stats_mcp_query_rules", STATS_SQLITE_TABLE_MCP_QUERY_RULES);
lib/MySQL_Catalog.cpp (2)
53-66: VLA is non-standard C++; prefer stack-allocated fixed buffer or std::vector.

Line 56 uses a variable-length array (VLA) which is a GCC extension and not standard C++. This can cause stack overflow for large paths.
♻️ Suggested fix
 int MySQL_Catalog::init() {
 	// Initialize database connection
 	db = new SQLite3DB();
-	char path_buf[db_path.size() + 1];
-	strcpy(path_buf, db_path.c_str());
-	int rc = db->open(path_buf, SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE);
+	int rc = db->open((char*)db_path.c_str(), SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE);
 	if (rc != SQLITE_OK) {
476-488: Merge uses hardcoded empty schema and fixed kinds; document or parameterize.

The merge operation hardcodes empty schema "" and only searches for "table" and "view" kinds. This may limit flexibility for future use cases.
include/Discovery_Schema.h (2)
93-132: Mixed ownership semantics in MCP_Query_Processor_Output.

new_query uses new/delete while error_msg and OK_msg use strdup/free. This inconsistency increases cognitive load and bug risk. Consider unifying ownership using std::string or std::unique_ptr<std::string> for all string members.
♻️ Suggested modernization
 struct MCP_Query_Processor_Output {
-	std::string *new_query;   // Rewritten query (caller must delete)
+	std::unique_ptr<std::string> new_query;   // Rewritten query
 	int timeout_ms;           // Query timeout in milliseconds (-1 = not set)
-	char *error_msg;          // Error message to return (NULL = not set)
-	char *OK_msg;             // OK message to return (NULL = not set)
+	std::string error_msg;    // Error message (empty = not set)
+	std::string OK_msg;       // OK message (empty = not set)
 	int log;                  // Whether to log this query (-1 = not set, 0 = no, 1 = yes)
 	int next_query_flagIN;    // Flag for next query (-1 = not set)
149-156: Consider using typed pointer instead of void for digest stats map.*

The mcp_digest_umap stores void* which loses type safety. Since the value type is always MCP_Query_Digest_Stats*, using the explicit type would prevent casting errors.
♻️ Suggested improvement
 	// MCP query digest statistics
-	std::unordered_map<std::string, std::unordered_map<uint64_t, void*>> mcp_digest_umap;
+	std::unordered_map<std::string, std::unordered_map<uint64_t, MCP_Query_Digest_Stats*>> mcp_digest_umap;
lib/Static_Harvester.cpp (1)

150-154: Hardcoded 30-second timeout; consider making configurable.

The connection timeouts are hardcoded. For environments with varying network latencies, this might be too short or unnecessarily long.

lib/Discovery_Schema.cpp (1)

62-75: VLA usage; same issue as MySQL_Catalog.

Line 65 uses a variable-length array. Apply the same fix as recommended for MySQL_Catalog.

coderabbitai · 2026-01-20T09:15:32Z

lib/Admin_Handler.cpp

+		// LOAD MCP QUERY RULES FROM DISK / TO MEMORY
+		// Copies rules from persistent storage (disk.mcp_query_rules) to working memory (main.mcp_query_rules)
+		if (
+			(query_no_space_length == strlen("LOAD MCP QUERY RULES FROM DISK") && !strncasecmp("LOAD MCP QUERY RULES FROM DISK", query_no_space, query_no_space_length))
+			||
+			(query_no_space_length == strlen("LOAD MCP QUERY RULES TO MEMORY") && !strncasecmp("LOAD MCP QUERY RULES TO MEMORY", query_no_space, query_no_space_length))
+			) {
+			l_free(*ql,*q);
+			*q=l_strdup("INSERT OR REPLACE INTO main.mcp_query_rules SELECT * FROM disk.mcp_query_rules");
+			*ql=strlen(*q)+1;
+			return true;
+		}
+
+		// SAVE MCP QUERY RULES TO DISK
+		// Copies rules from working memory (main.mcp_query_rules) to persistent storage (disk.mcp_query_rules)
+		if (
+			(query_no_space_length == strlen("SAVE MCP QUERY RULES TO DISK") && !strncasecmp("SAVE MCP QUERY RULES TO DISK", query_no_space, query_no_space_length))
+			) {
+			l_free(*ql,*q);
+			*q=l_strdup("INSERT OR REPLACE INTO disk.mcp_query_rules SELECT * FROM main.mcp_query_rules");
+			*ql=strlen(*q)+1;
+			return true;


⚠️ Potential issue | 🟠 Major

Prevent stale MCP query rules when syncing disk ↔ memory.

INSERT OR REPLACE ... SELECT doesn’t remove rows deleted on the source side, so obsolete rules can persist and later be re-activated on runtime loads. Clear the target table before copying (or use a dedicated flush helper) to keep disk/memory in sync.

🧩 Proposed fix (clear target before copy)

- l_free(*ql,*q); - *q=l_strdup("INSERT OR REPLACE INTO main.mcp_query_rules SELECT * FROM disk.mcp_query_rules"); - *ql=strlen(*q)+1; - return true; + ProxySQL_Admin* SPA = (ProxySQL_Admin*)pa; + SPA->admindb->execute("DELETE FROM main.mcp_query_rules"); + SPA->admindb->execute("INSERT OR REPLACE INTO main.mcp_query_rules SELECT * FROM disk.mcp_query_rules"); + SPA->send_ok_msg_to_client(sess, NULL, 0, query_no_space); + return false;

- l_free(*ql,*q); - *q=l_strdup("INSERT OR REPLACE INTO disk.mcp_query_rules SELECT * FROM main.mcp_query_rules"); - *ql=strlen(*q)+1; - return true; + ProxySQL_Admin* SPA = (ProxySQL_Admin*)pa; + SPA->admindb->execute("DELETE FROM disk.mcp_query_rules"); + SPA->admindb->execute("INSERT OR REPLACE INTO disk.mcp_query_rules SELECT * FROM main.mcp_query_rules"); + SPA->send_ok_msg_to_client(sess, NULL, 0, query_no_space); + return false;

🤖 Prompt for AI Agents

In `@lib/Admin_Handler.cpp` around lines 2362 - 2383, The current INSERT OR REPLACE SELECT statements in the LOAD and SAVE branches (where *q is set via l_strdup to "INSERT OR REPLACE INTO main.mcp_query_rules SELECT * FROM disk.mcp_query_rules" and "INSERT OR REPLACE INTO disk.mcp_query_rules SELECT * FROM main.mcp_query_rules") do not remove source-deleted rows and can resurrect stale rules; modify the SQL assigned to *q in both branches so the target table is cleared before copying (e.g., issue a DELETE FROM <target_table> or use a transactional sequence "DELETE FROM <target_table>; INSERT INTO <target_table> SELECT ...") while keeping the same surrounding logic (the blocks that test query_no_space_length and call l_free, l_strdup, set *ql and return true) and ensure both the LOAD (target main.mcp_query_rules) and SAVE (target disk.mcp_query_rules) cases are updated.

coderabbitai · 2026-01-20T09:15:32Z

lib/Discovery_Schema.cpp

+		char** pta = (char**)malloc(sizeof(char*) * 17);
+
+		pta[0] = strdup(std::to_string(rule->rule_id).c_str());           // rule_id
+		pta[1] = strdup(std::to_string(rule->active ? 1 : 0).c_str());    // active
+		pta[2] = rule->username ? strdup(rule->username) : NULL;         // username
+		pta[3] = rule->schemaname ? strdup(rule->schemaname) : NULL;      // schemaname
+		pta[4] = rule->tool_name ? strdup(rule->tool_name) : NULL;         // tool_name
+		pta[5] = rule->match_pattern ? strdup(rule->match_pattern) : NULL;  // match_pattern
+		pta[6] = strdup(std::to_string(rule->negate_match_pattern ? 1 : 0).c_str());  // negate_match_pattern
+		pta[7] = strdup(std::to_string(rule->re_modifiers).c_str());      // re_modifiers
+		pta[8] = strdup(std::to_string(rule->flagIN).c_str());            // flagIN
+		pta[9] = strdup(std::to_string(rule->flagOUT).c_str());           // flagOUT
+		pta[10] = rule->replace_pattern ? strdup(rule->replace_pattern) : NULL;  // replace_pattern
+		pta[11] = strdup(std::to_string(rule->timeout_ms).c_str());       // timeout_ms
+		pta[12] = rule->error_msg ? strdup(rule->error_msg) : NULL;      // error_msg
+		pta[13] = rule->ok_msg ? strdup(rule->ok_msg) : NULL;            // OK_msg
+		pta[14] = strdup(std::to_string(rule->log ? 1 : 0).c_str());      // log
+		pta[15] = strdup(std::to_string(rule->apply ? 1 : 0).c_str());    // apply
+		pta[16] = rule->comment ? strdup(rule->comment) : NULL;          // comment
+
+		result->add_row(pta);
+
+		// Free the row data
+		for (int j = 0; j < 17; j++) {
+			if (pta[j]) {
+				free(pta[j]);
+			}
+		}
+		free(pta);
+	}


⚠️ Potential issue | 🟡 Minor

No null checks after malloc/strdup allocations.

If malloc or strdup returns NULL due to memory exhaustion, the subsequent operations will cause a crash. While rare, this is flagged by static analysis.

Consider adding null checks or using a wrapper that handles allocation failures consistently across the codebase.

🧰 Tools

🪛 Cppcheck (2.19.0)

[warning] 2657-2657: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)

[warning] 2658-2658: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)

[warning] 2659-2659: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)

[warning] 2660-2660: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)

[warning] 2661-2661: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)

[warning] 2662-2662: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)

[warning] 2663-2663: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)

[warning] 2664-2664: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)

[warning] 2665-2665: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)

[warning] 2666-2666: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)

[warning] 2667-2667: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)

[warning] 2668-2668: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)

[warning] 2669-2669: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)

[warning] 2670-2670: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)

[warning] 2671-2671: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)

[warning] 2672-2672: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)

[warning] 2673-2673: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)

[warning] 2679-2679: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)

🤖 Prompt for AI Agents

In `@lib/Discovery_Schema.cpp` around lines 2655 - 2684, The code allocates pta with malloc and many strdup calls (symbols: pta, malloc, strdup, result->add_row) without checking for NULL; add checks after malloc and after each strdup (or use a helper that allocates and duplicates safely) so that on allocation failure you free any already-allocated entries, free pta, and avoid calling result->add_row (return an error or handle gracefully). Specifically: verify pta != NULL immediately after malloc, check each strdup result before storing it into pta (or wrap duplication to return NULL on failure and propagate), on any allocation failure free any non-NULL pta[i] and pta itself, and only call result->add_row when all required allocations succeeded.

lib/Discovery_Schema.cpp

lib/ProxySQL_Admin.cpp

coderabbitai · 2026-01-20T09:15:32Z

lib/Query_Tool_Handler.cpp

+			if (qpo->timeout_ms > 0) {
+				timeout_sec = qpo->timeout_ms / 1000;
+			}


⚠️ Potential issue | 🟡 Minor

Integer division truncates sub-second timeouts to zero.

The conversion qpo->timeout_ms / 1000 truncates values less than 1000ms to 0 seconds. If a rule specifies a 500ms timeout, timeout_sec becomes 0, which may cause unexpected behavior depending on how zero is handled downstream.

Consider rounding up to ensure sub-second timeouts are at least 1 second:

Suggested fix

// Apply timeout if provided if (qpo->timeout_ms > 0) { - timeout_sec = qpo->timeout_ms / 1000; + timeout_sec = (qpo->timeout_ms + 999) / 1000; // Round up to at least 1 sec }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (qpo->timeout_ms > 0) {

timeout_sec = qpo->timeout_ms / 1000;

}

if (qpo->timeout_ms > 0) {

timeout_sec = (qpo->timeout_ms + 999) / 1000; // Round up to at least 1 sec

}

🤖 Prompt for AI Agents

In `@lib/Query_Tool_Handler.cpp` around lines 1635 - 1637, The conversion of qpo->timeout_ms to timeout_sec uses integer division which truncates sub-second values to zero (e.g., 500ms => 0); change the conversion logic in Query_Tool_Handler.cpp around the timeout handling (variable timeout_sec and qpo->timeout_ms) to round up instead of truncating so any non-zero millisecond timeout becomes at least 1 second (e.g., replace the direct /1000 integer division with a ceil-style conversion such as (timeout_ms + 999)/1000 or use floating-point division plus ceil) and keep existing behavior when timeout_ms is 0 or negative.

Fixed 6 legitimate issues from AI code review: Critical fixes: - Implement stats___mcp_query_digest to populate table with actual data - Fix double trigger bug (stats_mcp_query_digest_reset substring matching) Important fixes: - Fix re_modifiers parsing (VARCHAR "CASELESS" -> int 1) - Fix TOCTOU race condition in reset path (use write lock from start) - Add column count validation before accessing row fields Documentation: - Add memory ownership documentation for evaluate_mcp_query_rules False positives ignored (Issue 2: Schema mismatch, Issue 6: SQL injection)

renecannao · 2026-01-20T10:52:31Z

@coderabbitai review

coderabbitai · 2026-01-20T10:52:37Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@lib/ProxySQL_Admin_Stats.cpp`:
- Around line 2605-2625: The current loop builds SQL via sprintf into query
(using template a) which allows quotes/JSON in digest_text (and other
r->fields[]) to break SQL or permit injection; replace this with a prepared
INSERT into stats_mcp_query_digest and bind the 10 values from
r->fields[0]..r->fields[9] for each SQLite3_row instead of sprintf/malloc.
Locate the loop using SQLite3_row* r and statsdb->execute and change it to
prepare a parameterized statement once (e.g., "INSERT INTO
stats_mcp_query_digest VALUES (?,?,?,?,?,?,?,?,?,?)"), bind each r->fields[i]
per row, execute/reset the statement for the next row, and remove the
sprintf/malloc usage to eliminate injection and quoting issues.
- Around line 2594-2606: The reset path deletes from
stats_mcp_query_digest_reset but the subsequent insert always writes into
stats_mcp_query_digest; update the insert target based on the reset flag so the
reset branch re-populates stats_mcp_query_digest_reset. Concretely, change how
the INSERT string (variable a) is chosen before the loop in the block that calls
statsdb->execute and iterates resultset->rows: if reset is true set a to an
INSERT INTO stats_mcp_query_digest_reset ... string, otherwise keep the INSERT
INTO stats_mcp_query_digest ... string, then use that chosen string inside the
insertion loop so inserts go to the correct table.

In `@lib/ProxySQL_Admin.cpp`:
- Around line 7789-7793: The cleanup code is calling free(resultset) on an
object returned by execute_statement (an instance of SQLite3_result), which is
undefined behavior; replace that call with the appropriate C++ delete (i.e.,
delete resultset) and ensure any other cleanup paths that free resultset are
updated similarly so SQLite3_result is destroyed properly (look for the
resultset variable and usages around execute_statement and the success==false
branch in ProxySQL_Admin.cpp).

♻️ Duplicate comments (8)

lib/ProxySQL_Admin_Stats.cpp (1)

2662-2673: Same concern as prior review: raw sprintf with unescaped fields.

This is the same SQL construction pattern that was previously flagged. Please use prepared statements or escape values to avoid injection/breakage.
lib/Discovery_Schema.cpp (7)
34-48: Duplicate cleanup logic with load_mcp_query_rules.

This cleanup block is nearly identical to lines 2373-2387 in load_mcp_query_rules. Consider extracting to a private helper function (e.g., clear_mcp_query_rules()) to eliminate duplication and reduce maintenance risk.

351-372: Schema mismatch between Discovery_Schema and admin tables.

This mcp_query_rules table schema differs from the admin table definitions (includes run_id, action, hits columns not present in admin tables). If rules are loaded from admin tables via load_mcp_query_rules, the column layouts won't align.

Verify whether this table is actually used or if it should be removed/synchronized with admin table definitions.

2529-2533: Username matching is non-functional.

Rules with username specified are always skipped since username context isn't available in MCP. This makes the username field in mcp_query_rules effectively unusable. The TODO should be tracked as a follow-up issue.

2632-2635: Redundant rule->apply check.

At line 2633, rule->apply is guaranteed to be true because log-only rules (!rule->apply) already continue at line 2575. The condition is logically redundant.
♻️ Suggested simplification
-			// If apply is true and not a log-only rule, stop processing further rules
-			if (rule->apply) {
-				break;
-			}
+			// Stop processing further rules (log-only rules already continued above)
+			break;
2865-2865: Confusing historical comment.

The comment mentions "digest_text was duplicated" which references a past bug fix rather than describing current state. Consider simplifying to just document the current column count.
📝 Suggested simplification
-	// Define columns (10 columns, not 11 - digest_text was duplicated)
+	// Define columns for MCP query digest statistics (10 columns)
2681-2710: No null checks after malloc/strdup allocations.

If malloc or strdup returns NULL due to memory exhaustion, subsequent operations will cause undefined behavior. While OOM is rare, the static analyzer flags this consistently across the function.

Consider adding a null check after malloc or using a wrapper that handles allocation failures consistently.

3000-3001: Incorrect example in documentation.

The example output {"filters":{"?":"?","?":"?"}} is wrong. The code at line 3017 uses it.key() which preserves object keys. The correct output should be:
Output: {"filters":{"status":"?","age":"?"}}
📝 Suggested fix
-//   Output: {"filters":{"?":"?","?":"?"}}
+//   Output: {"filters":{"status":"?","age":"?"}}

coderabbitai · 2026-01-20T11:01:31Z

lib/ProxySQL_Admin_Stats.cpp

+	statsdb->execute("BEGIN");
+
+	if (reset) {
+		statsdb->execute("DELETE FROM stats_mcp_query_digest_reset");
+	} else {
+		statsdb->execute("DELETE FROM stats_mcp_query_digest");
+	}
+
+	// Insert digest statistics into the stats table
+	// Columns: tool_name, run_id, digest, digest_text, count_star,
+	//          first_seen, last_seen, sum_time, min_time, max_time
+	char* a = (char*)"INSERT INTO stats_mcp_query_digest VALUES (\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\")";
+	for (std::vector<SQLite3_row*>::iterator it = resultset->rows.begin(); it != resultset->rows.end(); ++it) {


⚠️ Potential issue | 🟠 Major

Reset path inserts into the wrong table.

When reset is true you delete stats_mcp_query_digest_reset, but the insert always targets stats_mcp_query_digest. That leaves the reset table empty and overwrites the non‑reset view on reset reads.

🛠️ Suggested fix

- char* a = (char*)"INSERT INTO stats_mcp_query_digest VALUES (\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\")"; + const char* table = reset ? "stats_mcp_query_digest_reset" : "stats_mcp_query_digest"; + std::string a = std::string("INSERT INTO ") + table + + " VALUES (\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\")"; ... - char* query = (char*)malloc(strlen(a) + arg_len + 32); - sprintf(query, a, + char* query = (char*)malloc(a.size() + arg_len + 32); + sprintf(query, a.c_str(), r->fields[0], r->fields[1], r->fields[2], r->fields[3], r->fields[4], r->fields[5], r->fields[6], r->fields[7], r->fields[8], r->fields[9]);

🤖 Prompt for AI Agents

In `@lib/ProxySQL_Admin_Stats.cpp` around lines 2594 - 2606, The reset path deletes from stats_mcp_query_digest_reset but the subsequent insert always writes into stats_mcp_query_digest; update the insert target based on the reset flag so the reset branch re-populates stats_mcp_query_digest_reset. Concretely, change how the INSERT string (variable a) is chosen before the loop in the block that calls statsdb->execute and iterates resultset->rows: if reset is true set a to an INSERT INTO stats_mcp_query_digest_reset ... string, otherwise keep the INSERT INTO stats_mcp_query_digest ... string, then use that chosen string inside the insertion loop so inserts go to the correct table.

coderabbitai · 2026-01-20T11:01:31Z

lib/ProxySQL_Admin_Stats.cpp

+	char* a = (char*)"INSERT INTO stats_mcp_query_digest VALUES (\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\")";
+	for (std::vector<SQLite3_row*>::iterator it = resultset->rows.begin(); it != resultset->rows.end(); ++it) {
+		SQLite3_row* r = *it;
+		int arg_len = 0;
+		for (int i = 0; i < 10; i++) {
+			arg_len += strlen(r->fields[i]);
+		}
+		char* query = (char*)malloc(strlen(a) + arg_len + 32);
+		sprintf(query, a,
+			r->fields[0],  // tool_name
+			r->fields[1],  // run_id
+			r->fields[2],  // digest
+			r->fields[3],  // digest_text
+			r->fields[4],  // count_star
+			r->fields[5],  // first_seen
+			r->fields[6],  // last_seen
+			r->fields[7],  // sum_time
+			r->fields[8],  // min_time
+			r->fields[9]   // max_time
+		);
+		statsdb->execute(query);


⚠️ Potential issue | 🟠 Major

Avoid raw sprintf inserts for digest rows.

digest_text and other fields can contain quotes/JSON, which will break the SQL and can enable injection into the stats DB. Use a prepared statement with bound parameters (as you already do in other stats functions).

🛠️ Suggested approach (prepared statement)

- char* a = (char*)"INSERT INTO stats_mcp_query_digest VALUES (\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",\"%s\")"; - for (...) { - ... - sprintf(query, a, ...); - statsdb->execute(query); - free(query); - } + const char* sql = reset + ? "INSERT INTO stats_mcp_query_digest_reset VALUES (?1,?2,?3,?4,?5,?6,?7,?8,?9,?10)" + : "INSERT INTO stats_mcp_query_digest VALUES (?1,?2,?3,?4,?5,?6,?7,?8,?9,?10)"; + sqlite3_stmt* stmt = nullptr; + int rc = statsdb->prepare_v2(sql, &stmt); + ASSERT_SQLITE_OK(rc, statsdb); + for (SQLite3_row* r : resultset->rows) { + rc = (*proxy_sqlite3_bind_text)(stmt, 1, r->fields[0], -1, SQLITE_TRANSIENT); ASSERT_SQLITE_OK(rc, statsdb); + rc = (*proxy_sqlite3_bind_text)(stmt, 2, r->fields[1], -1, SQLITE_TRANSIENT); ASSERT_SQLITE_OK(rc, statsdb); + rc = (*proxy_sqlite3_bind_text)(stmt, 3, r->fields[2], -1, SQLITE_TRANSIENT); ASSERT_SQLITE_OK(rc, statsdb); + rc = (*proxy_sqlite3_bind_text)(stmt, 4, r->fields[3], -1, SQLITE_TRANSIENT); ASSERT_SQLITE_OK(rc, statsdb); + rc = (*proxy_sqlite3_bind_text)(stmt, 5, r->fields[4], -1, SQLITE_TRANSIENT); ASSERT_SQLITE_OK(rc, statsdb); + rc = (*proxy_sqlite3_bind_text)(stmt, 6, r->fields[5], -1, SQLITE_TRANSIENT); ASSERT_SQLITE_OK(rc, statsdb); + rc = (*proxy_sqlite3_bind_text)(stmt, 7, r->fields[6], -1, SQLITE_TRANSIENT); ASSERT_SQLITE_OK(rc, statsdb); + rc = (*proxy_sqlite3_bind_text)(stmt, 8, r->fields[7], -1, SQLITE_TRANSIENT); ASSERT_SQLITE_OK(rc, statsdb); + rc = (*proxy_sqlite3_bind_text)(stmt, 9, r->fields[8], -1, SQLITE_TRANSIENT); ASSERT_SQLITE_OK(rc, statsdb); + rc = (*proxy_sqlite3_bind_text)(stmt,10, r->fields[9], -1, SQLITE_TRANSIENT); ASSERT_SQLITE_OK(rc, statsdb); + SAFE_SQLITE3_STEP2(stmt); + rc = (*proxy_sqlite3_clear_bindings)(stmt); ASSERT_SQLITE_OK(rc, statsdb); + rc = (*proxy_sqlite3_reset)(stmt); ASSERT_SQLITE_OK(rc, statsdb); + } + (*proxy_sqlite3_finalize)(stmt);

🤖 Prompt for AI Agents

In `@lib/ProxySQL_Admin_Stats.cpp` around lines 2605 - 2625, The current loop builds SQL via sprintf into query (using template a) which allows quotes/JSON in digest_text (and other r->fields[]) to break SQL or permit injection; replace this with a prepared INSERT into stats_mcp_query_digest and bind the 10 values from r->fields[0]..r->fields[9] for each SQLite3_row instead of sprintf/malloc. Locate the loop using SQLite3_row* r and statsdb->execute and change it to prepare a parameterized statement once (e.g., "INSERT INTO stats_mcp_query_digest VALUES (?,?,?,?,?,?,?,?,?,?)"), bind each r->fields[i] per row, execute/reset the statement for the next row, and remove the sprintf/malloc usage to eliminate injection and quoting issues.

coderabbitai · 2026-01-20T11:01:31Z

lib/ProxySQL_Admin.cpp

+	if (success == false) {
+		if (resultset) {
+			free(resultset);
+		}
+	}


⚠️ Potential issue | 🟠 Major

Use delete for SQLite3_result cleanup.
execute_statement returns a C++ object; freeing it with free is undefined behavior.

🐛 Proposed fix

- if (resultset) { - free(resultset); - } + if (resultset) { + delete resultset; + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (success == false) {

if (resultset) {

free(resultset);

}

}

if (success == false) {

if (resultset) {

delete resultset;

}

}

🤖 Prompt for AI Agents

In `@lib/ProxySQL_Admin.cpp` around lines 7789 - 7793, The cleanup code is calling free(resultset) on an object returned by execute_statement (an instance of SQLite3_result), which is undefined behavior; replace that call with the appropriate C++ delete (i.e., delete resultset) and ensure any other cleanup paths that free resultset are updated similarly so SQLite3_result is destroyed properly (look for the resultset variable and usages around execute_statement and the success==false branch in ProxySQL_Admin.cpp).

- Add special build rule in Makefile for anomaly_detection-t that includes: - $(OBJ) for global variables (GloVars, GloGATH) - -Wl,--allow-multiple-definition to allow test's main() to override ProxySQL's - ClickHouse client libraries (libclickhouse-cpp-lib.a, libzstdstatic.a, liblz4.a) - SQLite rembed library (libsqlite_rembed.a) - -lscram -lusual for PostgreSQL SCRAM support - Add stub function SQLite3_Server_session_handler required by SQLite3_Server.cpp Resolves compilation errors for anomaly_detection-t test.

- Add #include <string> for C++ std::string support - Add #include <cmath> for sqrt() function - Change format %lld to %ld for chrono duration types (long int, not long long) Resolves compilation errors for vector_db_performance-t test.

- Add #include <string> for C++ std::string support Resolves compilation errors for ai_llm_retry_scenarios-t test.

This commit addresses valid concerns raised by coding agents (Gemini, Copilot, CoderabbitAI): 1. Fix stats_mcp_query_digest naming conflict (ProxySQL_Admin.cpp): - Made reset and non-reset paths mutually exclusive using else block - Prevents both flags from being true, matching MySQL pattern - Ensures reset takes precedence over non-reset 2. Fix INSERT OR REPLACE sync issue (Admin_Handler.cpp): - Added DELETE before INSERT OR REPLACE in LOAD/SAVE MCP QUERY RULES - Prevents stale rules from persisting when syncing disk <-> memory - Ensures deleted source rows are also removed from target 3. Fix integer division truncation for timeout (Query_Tool_Handler.cpp): - Changed timeout_ms/1000 to (timeout_ms+999)/1000 for ceiling division - Ensures sub-second timeouts (e.g., 500ms) become at least 1 second - Prevents zero-second timeouts from causing unexpected behavior 4. Remove confusing comment (Discovery_Schema.cpp): - Simplified column count comment to be clear and accurate Note: The re_modifiers parsing code already correctly handles VARCHAR "CASELESS" to int conversion (lines 2414-2425), so that review comment was already addressed.

renecannao · 2026-01-21T13:49:53Z

Thank you to the three coding agents for the thorough code review!

Summary of Fixes

I've addressed the valid concerns raised by @gemini-code-assist[bot], @copilot, and @coderabbitai[bot] in commit :

Issues Fixed:

stats_mcp_query_digest naming conflict (ProxySQL_Admin.cpp)
- Made reset and non-reset paths mutually exclusive using block
- Matches the MySQL pattern for consistency
INSERT OR REPLACE sync issue (Admin_Handler.cpp)
- Added before in LOAD/SAVE MCP QUERY RULES
- Prevents stale rules from persisting when syncing disk ↔ memory
Integer division truncation for timeout (Query_Tool_Handler.cpp)
- Changed to ceiling division:
- Ensures sub-second timeouts (e.g., 500ms) become at least 1 second
Confusing comment about column count (Discovery_Schema.cpp)
- Simplified the comment for clarity

Already Correct:

re_modifiers parsing - The code already properly handles VARCHAR "CASELESS" to int conversion (lines 2414-2425 in Discovery_Schema.cpp)

Items Requiring Further Discussion (deferred to future work):

Some excellent suggestions were noted for future enhancements:

Using instead of raw pointers for automatic memory management
Adding NULL checks after malloc/strdup allocations
Extracting cleanup logic into helper functions to reduce duplication
Completing the implementation (currently has TODO)

These are good ideas for follow-up PRs to improve code quality and robustness.

Thanks again for the thoughtful reviews! 🙏

Copilot · 2026-01-21T13:50:05Z

@renecannao I've opened a new pull request, #24, to work on those changes. Once the pull request is ready, I'll request review from you.

gemini-code-assist · 2026-01-21T13:50:18Z

Hi @renecannao,

Thank you for the detailed summary of the fixes and for addressing the feedback from the review! It's great to see these improvements implemented.

I particularly appreciate the fixes for:

stats_mcp_query_digest naming conflict: Ensuring mutual exclusivity for reset and non-reset paths is crucial for data integrity and consistency with existing patterns.
Integer division truncation for timeout: The change to ceiling division for timeout_ms correctly handles sub-second timeouts, preventing unexpected behavior.
Confusing comment about column count: Clarifying comments always improves maintainability.

I also agree that the suggestions regarding std::unique_ptr, NULL checks for malloc/strdup, extracting cleanup logic, and completing TODO implementations are excellent ideas for follow-up PRs. Focusing on the core functionality first and then enhancing robustness and code quality in subsequent iterations is a sound approach.

Thanks again for your thorough work on this pull request! 🙏

coderabbitai · 2026-01-21T13:50:36Z

Tip

For best results, initiate chat on the files or code changes.

@renecannao, thank you for the detailed summary and for addressing the review feedback so promptly! 🎉

Your fixes look solid:

The stats_mcp_query_digest mutual exclusion with if/else is a clean solution that matches existing patterns
Adding DELETE before INSERT OR REPLACE prevents stale data during sync operations
The ceiling division for timeout ensures sub-second timeouts don't get truncated to zero
Clarifying the column count comment improves readability

Great to hear the re_modifiers parsing was already handling VARCHAR-to-int conversion correctly!

Deferring the enhancement suggestions (smart pointers, NULL checks, helper extraction, TODO completion) to future PRs is a pragmatic approach that keeps this PR focused. Those are excellent candidates for follow-up quality improvements.

Thanks for being so responsive to the feedback! If you'd like me to review the new commit with these fixes, just let me know. 🙏

renecannao added 6 commits January 19, 2026 13:28

fix: Load re_modifiers field from database in load_mcp_query_rules()

c092fdb

Previously re_modifiers was hardcoded to 1 (CASELESS), ignoring the value stored in the database. Now properly reads from row->fields[7].

gemini-code-assist bot reviewed Jan 19, 2026

View reviewed changes

renecannao changed the title ~~feat: Add MCP query rules and digest statistics~~ feat: Add MCP query rules and digest statistics [WIP] Jan 19, 2026

Copilot AI mentioned this pull request Jan 20, 2026

[WIP] Update pull request 19 for ProxySQL project #20

Closed

renecannao requested a review from Copilot January 20, 2026 09:01

Copilot started reviewing on behalf of renecannao January 20, 2026 09:02 View session

Copilot AI reviewed Jan 20, 2026

View reviewed changes

coderabbitai bot reviewed Jan 20, 2026

View reviewed changes

renecannao added 4 commits January 21, 2026 13:12

fix: Missing <string> header in ai_llm_retry_scenarios-t

f45506e

- Add #include <string> for C++ std::string support Resolves compilation errors for ai_llm_retry_scenarios-t test.

Copilot AI mentioned this pull request Jan 21, 2026

feat: Add MCP query rules and digest statistics #24

Closed

renecannao merged commit af28598 into v3.1-MCP2 Jan 21, 2026
1 check passed

This was referenced Jan 21, 2026

Fix PR #21 Review: Security, Memory Safety, Thread Safety + Additional Fixes #25

Merged

Merge v3.1-MCP2 into v3.1-vec (conflicts resolved) #26

Merged

fix: Multiple issues with MCP query_(rules/digests) #27

Merged

		// Input: {"filters": {"status": "active", "age": 25}}
		// Output: {"filters":{"?":"?","?":"?"}}

	// Define columns (10 columns, not 11 - digest_text was duplicated)
	// Define columns for MCP query digest statistics

	// Output: {"filters":{"?":"?","?":"?"}}
	// Output: {"filters":{"status":"?","age":"?"}}

-			if (!schemaname.empty() && strcmp(rule->schemaname, schemaname.c_str()) != 0) {
+			// Rules with an explicit schemaname only apply when a matching
+			// non-empty schemaname is provided.
+			if (schemaname.empty() || strcmp(rule->schemaname, schemaname.c_str()) != 0) {

feat: Add MCP query rules and digest statistics [WIP] #19

feat: Add MCP query rules and digest statistics [WIP] #19

Uh oh!

Conversation

renecannao commented Jan 19, 2026

Summary

Key Design

Changes

Tables Added

Commands Added

Documentation

Test plan

Uh oh!

coderabbitai bot commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

gemini-code-assist bot commented Jan 19, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

renecannao commented Jan 20, 2026

Uh oh!

coderabbitai bot commented Jan 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 20, 2026

coderabbitai bot commented Jan 19, 2026 •

edited

Loading