Component
SQL collection scripts
Performance Monitor Version
PerformanceMonitor v2.0.0
SQL Server Version
SQL Server 2019 Standard Edition (15.0.2130.3)
Windows Version
Windows Server 2019
Describe the Bug
collect.query_stats_collector collects execution plans via
sys.dm_exec_text_query_plan for every query in
sys.dm_exec_query_stats. On busy instances with many databases,
this causes unsustainable storage growth.
The plan XML is already collected separately by query_store_collector
in collect.query_store_data, making the query_plan_text column
in collect.query_stats redundant.
Additionally, the default collection frequency of 2 minutes combined
with the Hung Job Monitor threshold of 5 minutes means the collector
is systematically killed before it can finish on busy instances,
while still generating massive I/O and tempdb pressure on every attempt.
Steps to Reproduce
-
Install PerformanceMonitor v2.0.0 on a SQL Server instance
with 20+ active databases
-
Let the collection job run at its default schedule (every 2 minutes)
-
Check storage consumption of collect.query_stats after a few hours:
SELECT
COUNT() AS nb_lignes,
SUM(DATALENGTH(query_plan_text)) / 1024 / 1024 AS plans_xml_mb,
SUM(DATALENGTH(query_text)) / 1024 / 1024 AS query_text_mb,
SUM(DATALENGTH(query_plan_text) + DATALENGTH(query_text))
/ COUNT() / 1024 AS avg_kb_per_row
FROM collect.query_stats;
-
Check collection_log for duration and rows collected per run
Expected Behavior
Storage growth of collect.query_stats should be proportional to
query activity and manageable within the configured retention window.
With query_text only (~1.3 KB per row), 7 days of data at 25,000 rows
per 15-minute collection cycle would represent ~150 MB —
perfectly manageable.
The collector should complete within its max_duration_minutes
without being killed by the Hung Job Monitor.
Actual Behavior
On our environment (20+ active databases, collection every 15 minutes
after we manually corrected the frequency):
collect.query_stats after ~2 hours of collection:
- 160,566 rows
- 11.4 GB total
- of which query_plan_text alone: 6.5 GB (97%)
- of which query_text: 143 MB (1%)
Average row size: 63 KB — almost entirely due to the XML plan.
At ~25,000 new rows per collection cycle every 15 minutes:
- ~1.5 GB added every 15 minutes
- ~6 GB per hour
- Projected: ~144 GB/day
At the default frequency of 2 minutes the situation is 3x worse.
The PerformanceMonitor database exhausted all available
disk space within hours of installation.
Error Messages / Log Output
No explicit errors — the collector runs successfully but silently
consumes disk space until the drive is full.
Observed in collection_log:
collector_name rows_collected duration_ms
query_stats_collector 37,311 82,899
query_stats_collector 37,237 115,746
query_stats_collector 32,681 110,120
query_stats_collector 27,874 59,553
query_stats_collector 25,463 53,427
query_stats_collector 23,330 71,789
At default frequency (every 2 minutes), the Hung Job Monitor
kills the job after 5 minutes on every run:
collector_name collection_status error_message
hung_job_monitor JOB_HUNG Job "PerformanceMonitor - Collection"
hung for 7 minutes (threshold: 5 minutes).
Stopping job.
Screenshots
No response
Additional Context
Environment:
- SQL Server 2019 Standard Edition (15.0.2130.3)
- Windows Server 2019
- PerformanceMonitor v2.0.0
- 20+ active databases
Root cause #1 — Plan XML storage:
The OUTER APPLY to sys.dm_exec_text_query_plan fetches and stores
the full XML execution plan for every collected query:
-- This single column represents 97% of table size:
query_plan_text = tqp.query_plan
OUTER APPLY sys.dm_exec_text_query_plan
(
qs.plan_handle,
qs.statement_start_offset,
qs.statement_end_offset
) AS tqp
Root cause #2 — Default frequency too aggressive:
Default schedule for heavy collectors:
collector_name frequency_minutes max_duration_minutes
query_stats_collector 2 5
query_store_collector 2 10
procedure_stats_collector 2 10
Combined with Hung Job Monitor threshold of 5 minutes, these
collectors are systematically killed before completion on busy instances.
Proposed fixes:
Fix #1 — Disable plan XML collection in query_stats_collector:
-- Remove:
OUTER APPLY sys.dm_exec_text_query_plan(...) AS tqp
-- Replace:
query_plan_text = tqp.query_plan
-- With:
query_plan_text = NULL
This reduces average row size from ~63 KB to ~1.3 KB —
a 98% reduction in storage consumption.
Alternatively, make plan collection opt-in via a parameter:
@collect_query_plans BIT = 0
Fix #2 — Increase default frequency for heavy collectors
in installation scripts:
-- In 04_create_schedule_table.sql or equivalent:
query_stats_collector: frequency_minutes = 15, max_duration_minutes = 12
query_store_collector: frequency_minutes = 15, max_duration_minutes = 12
procedure_stats_collector: frequency_minutes = 15, max_duration_minutes = 12
plan_cache_stats_collector: frequency_minutes = 10, max_duration_minutes = 8
Fix #3 — Increase Hung Job Monitor default threshold
in 45_create_agent_jobs.sql:
-- Replace:
@normal_max_duration_minutes = 5
-- With:
@normal_max_duration_minutes = 15
Workarounds applied on our environment:
-
Patched collect.query_stats_collector locally to set
query_plan_text = NULL and removed OUTER APPLY
sys.dm_exec_text_query_plan. After truncating collect.query_stats
(11.4 GB) and applying the patch, projected storage growth dropped
from ~144 GB/day to ~150 MB/7 days.
-
Updated collection frequencies:
UPDATE config.collection_schedule
SET frequency_minutes = 15, max_duration_minutes = 12
WHERE collector_name IN
(
N'query_stats_collector',
N'query_store_collector',
N'procedure_stats_collector'
);
UPDATE config.collection_schedule
SET frequency_minutes = 10, max_duration_minutes = 8
WHERE collector_name = N'plan_cache_stats_collector';
-
Updated Hung Job Monitor threshold:
EXEC msdb.dbo.sp_update_jobstep
@job_name = N'PerformanceMonitor - Hung Job Monitor',
@step_id = 1,
@command = N'EXECUTE config.check_hung_collector_job
@job_name = N''PerformanceMonitor - Collection'',
@normal_max_duration_minutes = 15,
@first_run_max_duration_minutes = 30,
@stop_hung_job = 1,
@debug = 0;';
Component
SQL collection scripts
Performance Monitor Version
PerformanceMonitor v2.0.0
SQL Server Version
SQL Server 2019 Standard Edition (15.0.2130.3)
Windows Version
Windows Server 2019
Describe the Bug
collect.query_stats_collectorcollects execution plans viasys.dm_exec_text_query_planfor every query insys.dm_exec_query_stats. On busy instances with many databases,this causes unsustainable storage growth.
The plan XML is already collected separately by
query_store_collectorin
collect.query_store_data, making thequery_plan_textcolumnin
collect.query_statsredundant.Additionally, the default collection frequency of 2 minutes combined
with the Hung Job Monitor threshold of 5 minutes means the collector
is systematically killed before it can finish on busy instances,
while still generating massive I/O and tempdb pressure on every attempt.
Steps to Reproduce
Install PerformanceMonitor v2.0.0 on a SQL Server instance
with 20+ active databases
Let the collection job run at its default schedule (every 2 minutes)
Check storage consumption of collect.query_stats after a few hours:
SELECT
COUNT() AS nb_lignes,
SUM(DATALENGTH(query_plan_text)) / 1024 / 1024 AS plans_xml_mb,
SUM(DATALENGTH(query_text)) / 1024 / 1024 AS query_text_mb,
SUM(DATALENGTH(query_plan_text) + DATALENGTH(query_text))
/ COUNT() / 1024 AS avg_kb_per_row
FROM collect.query_stats;
Check collection_log for duration and rows collected per run
Expected Behavior
Storage growth of collect.query_stats should be proportional to
query activity and manageable within the configured retention window.
With query_text only (~1.3 KB per row), 7 days of data at 25,000 rows
per 15-minute collection cycle would represent ~150 MB —
perfectly manageable.
The collector should complete within its max_duration_minutes
without being killed by the Hung Job Monitor.
Actual Behavior
On our environment (20+ active databases, collection every 15 minutes
after we manually corrected the frequency):
collect.query_stats after ~2 hours of collection:
Average row size: 63 KB — almost entirely due to the XML plan.
At ~25,000 new rows per collection cycle every 15 minutes:
At the default frequency of 2 minutes the situation is 3x worse.
The PerformanceMonitor database exhausted all available
disk space within hours of installation.
Error Messages / Log Output
Screenshots
No response
Additional Context
Environment:
Root cause #1 — Plan XML storage:
The OUTER APPLY to sys.dm_exec_text_query_plan fetches and stores
the full XML execution plan for every collected query:
Root cause #2 — Default frequency too aggressive:
Default schedule for heavy collectors:
Combined with Hung Job Monitor threshold of 5 minutes, these
collectors are systematically killed before completion on busy instances.
Proposed fixes:
Fix #1 — Disable plan XML collection in query_stats_collector:
This reduces average row size from ~63 KB to ~1.3 KB —
a 98% reduction in storage consumption.
Alternatively, make plan collection opt-in via a parameter:
Fix #2 — Increase default frequency for heavy collectors
in installation scripts:
Fix #3 — Increase Hung Job Monitor default threshold
in 45_create_agent_jobs.sql:
Workarounds applied on our environment:
Patched collect.query_stats_collector locally to set
query_plan_text = NULL and removed OUTER APPLY
sys.dm_exec_text_query_plan. After truncating collect.query_stats
(11.4 GB) and applying the patch, projected storage growth dropped
from ~144 GB/day to ~150 MB/7 days.
Updated collection frequencies:
UPDATE config.collection_schedule
SET frequency_minutes = 15, max_duration_minutes = 12
WHERE collector_name IN
(
N'query_stats_collector',
N'query_store_collector',
N'procedure_stats_collector'
);
UPDATE config.collection_schedule
SET frequency_minutes = 10, max_duration_minutes = 8
WHERE collector_name = N'plan_cache_stats_collector';
Updated Hung Job Monitor threshold:
EXEC msdb.dbo.sp_update_jobstep
@job_name = N'PerformanceMonitor - Hung Job Monitor',
@step_id = 1,
@command = N'EXECUTE config.check_hung_collector_job
@job_name = N''PerformanceMonitor - Collection'',
@normal_max_duration_minutes = 15,
@first_run_max_duration_minutes = 30,
@stop_hung_job = 1,
@debug = 0;';