From 94620e91ae55aeca7342bf62988f1037c5a2dd25 Mon Sep 17 00:00:00 2001 From: Michael Rademaker Date: Wed, 29 Oct 2025 11:36:50 +0100 Subject: [PATCH 1/5] docs: add query exporter monitoring documentation --- docs/developer-guide/query-exporter.md | 170 +++++++++++++++++++++++++ 1 file changed, 170 insertions(+) create mode 100644 docs/developer-guide/query-exporter.md diff --git a/docs/developer-guide/query-exporter.md b/docs/developer-guide/query-exporter.md new file mode 100644 index 00000000..72c6206f --- /dev/null +++ b/docs/developer-guide/query-exporter.md @@ -0,0 +1,170 @@ +--- +sidebar_position: 18 +--- + +# Query Exporter + +The query-exporter service monitors the OpenRemote PostgreSQL database and exposes metrics on port 9560 for Prometheus scraping. It uses [query-exporter](https://github.com/albertodonato/query-exporter) to collect database health metrics. + +## Available Metrics + +### Table and Index Bloat +- `pg_table_bloat_count` - Number of tables/indexes with bloat exceeding thresholds +- `pg_table_bloat_ratio` - Bloat ratio per table/index (1.0 = no bloat, 2.0 = 100% bloat) +- `pg_table_bloat_bytes` - Estimated bloat size in bytes per table/index +- `pg_table_bloat_wasted_mb` - Estimated wasted space in megabytes per table/index + +### Autovacuum Workers +- `pg_autovacuum_workers_active` - Number of currently active autovacuum workers +- `pg_autovacuum_workers_max` - Maximum number of autovacuum workers configured +- `pg_autovacuum_running` - Running autovacuum processes (labels: database, table_schema, table_name, phase) + +### Datapoint Query Performance +- `pg_datapoint_query_duration_seconds` - Histogram of execution times for the attribute with most datapoints +- `pg_datapoint_count` - Total number of datapoints for the top attribute + +### Database Health +- `pg_database_size_megabytes` - Total database size in megabytes +- `pg_connections_active` - Number of active connections +- `pg_connections_idle` - Number of idle connections +- `pg_locks_count` - Number of locks by type + +## Configuration + +### Environment Variables +The service uses the following environment variables (automatically configured in `deploy.yml`): + +**Database Connection:** +- `POSTGRES_HOST` - Database host (default: `postgresql`) +- `POSTGRES_PORT` - Database port (default: `5432`) +- `POSTGRES_DB` - Database name (default: `openremote`) +- `POSTGRES_USER` - Database user (default: `postgres`) +- `POSTGRES_PASSWORD` - Database password (default: `postgres`) + +**Bloat Thresholds:** +- `TABLE_BLOAT_THRESHOLD` - Table bloat ratio threshold (default: `1.2` = 20% bloat) +- `INDEX_BLOAT_THRESHOLD` - Index bloat ratio threshold (default: `1.5` = 50% bloat) + +:::note + +Indexes typically bloat faster than tables, so the default index threshold is higher. + +::: + +### Customize Thresholds +Set environment variables before starting services: +```bash +export TABLE_BLOAT_THRESHOLD=1.3 # 30% table bloat +export INDEX_BLOAT_THRESHOLD=2.0 # 100% index bloat +``` + +### Query Intervals +- Table bloat queries: Every 5 minutes +- Autovacuum queries: Every 30 seconds +- Datapoint performance: Every 60 seconds +- Database size: Every 5 minutes +- Connection/lock stats: Every 30 seconds + +## Accessing Metrics + +### View Metrics Endpoint +```bash +curl http://localhost:9560/metrics +``` + +:::note + +In `dev-testing.yml`, the port is exposed as `9560:9560`. In production (`deploy.yml`), it's bound to `127.0.0.1:9560:9560` for security. + +::: + +### Expose on Private Network +To expose on a private network in production, uncomment this line in `deploy.yml`: +```yaml +- "${PRIVATE_IP:-127.0.0.1}:9560:9560" +``` + +## Prometheus Integration + +Add this scrape configuration to your Prometheus config: + +```yaml +scrape_configs: + - job_name: 'openremote-postgres' + static_configs: + - targets: ['localhost:9560'] + scrape_interval: 30s +``` + +## Customizing Queries + +To modify queries or add new metrics: + +1. Edit `config.yaml` in the query-exporter configuration directory +2. Restart the service: +```bash +docker-compose restart query-exporter +``` + +## Troubleshooting + +### Check Service Logs +```bash +docker-compose logs -f query-exporter +``` + +### Test Database Connectivity +```bash +docker-compose exec query-exporter sh +apk add postgresql-client +psql -h $POSTGRES_HOST -U $POSTGRES_USER -d $POSTGRES_DB +``` + +### Verify Metrics Endpoint +```bash +curl http://localhost:9560/metrics +``` + +## Performance Tuning + +If bloat detection queries impact database performance: + +- **Increase query interval** - Change from 300s to 600s or higher in `config.yaml` +- **Limit to specific schemas** - Modify queries to target specific schemas only +- **Schedule off-peak runs** - Use `schedule` option instead of `interval` +- **Reduce sample size** - Lower the datapoint query sample size (default: 100) + +### Query Complexity +- Bloat detection scans `pg_stats` and `pg_class` catalogs (limited to top 50 results) +- Datapoint performance samples 100 most recent datapoints from the largest attribute +- All queries exclude PostgreSQL system schemas (`pg_%` and `information_schema`) + +## Understanding Bloat + +### Bloat Ratio Values +- `1.0` - No bloat (optimal size) +- `1.2` - 20% bloat (default table threshold) +- `1.5` - 50% bloat (default index threshold) +- `2.0` - 100% bloat (object is twice the optimal size) + +### Maintenance Actions +- **Tables > 1.2** - Run `VACUUM FULL` during maintenance window +- **Indexes > 1.5** - Run `REINDEX` on affected indexes +- **Critical bloat (> 2.0)** - Immediate maintenance recommended + +### PostgreSQL Constants +The bloat detection queries use these PostgreSQL internal constants: +- `1048576` - Bytes per megabyte (1024 × 1024) +- `8` - Bits per byte (for null bitmap calculation) +- `20` - Page header size in bytes +- `12` - Index header overhead in bytes +- `4` - Item pointer size in bytes +- `23` - Tuple header size for PostgreSQL 14+ (Linux) +- `4` - Memory alignment for Linux containers + +## References + +- [Query Exporter Documentation](https://github.com/albertodonato/query-exporter) +- [Configuration Format](https://github.com/albertodonato/query-exporter/blob/main/docs/configuration.rst) +- [PostgreSQL Statistics Views](https://www.postgresql.org/docs/current/monitoring-stats.html) +- [PostgreSQL Bloat Detection](https://wiki.postgresql.org/wiki/Show_database_bloat) From 529f9c9ceb69d579b5e50da5e7ccf0f5051f984f Mon Sep 17 00:00:00 2001 From: Michael Rademaker Date: Wed, 26 Nov 2025 11:29:22 +0100 Subject: [PATCH 2/5] Improved documentation based on feedback --- docs/developer-guide/query-exporter.md | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-) diff --git a/docs/developer-guide/query-exporter.md b/docs/developer-guide/query-exporter.md index 72c6206f..60fe43a7 100644 --- a/docs/developer-guide/query-exporter.md +++ b/docs/developer-guide/query-exporter.md @@ -32,7 +32,7 @@ The query-exporter service monitors the OpenRemote PostgreSQL database and expos ## Configuration ### Environment Variables -The service uses the following environment variables (automatically configured in `deploy.yml`): +The service uses the following environment variables (automatically configured in `profile/deploy.yml`): **Database Connection:** - `POSTGRES_HOST` - Database host (default: `postgresql`) @@ -72,14 +72,8 @@ export INDEX_BLOAT_THRESHOLD=2.0 # 100% index bloat curl http://localhost:9560/metrics ``` -:::note - -In `dev-testing.yml`, the port is exposed as `9560:9560`. In production (`deploy.yml`), it's bound to `127.0.0.1:9560:9560` for security. - -::: - ### Expose on Private Network -To expose on a private network in production, uncomment this line in `deploy.yml`: +To expose on a private network in production, uncomment this line in `profile/deploy.yml`: ```yaml - "${PRIVATE_IP:-127.0.0.1}:9560:9560" ``` @@ -100,22 +94,22 @@ scrape_configs: To modify queries or add new metrics: -1. Edit `config.yaml` in the query-exporter configuration directory +1. Edit the `config.yaml` file located in the `query-exporter` configuration directory (by default, this is `/deployment/query-exporter/config.yaml` which is mounted as a Docker volume at the container's `/config/config.yaml` path—see your `profile/deploy.yml` for the exact path). 2. Restart the service: ```bash -docker-compose restart query-exporter +docker-compose -f profile/deploy.yml restart query-exporter ``` ## Troubleshooting ### Check Service Logs ```bash -docker-compose logs -f query-exporter +docker-compose -f profile/deploy.yml logs -f query-exporter ``` ### Test Database Connectivity ```bash -docker-compose exec query-exporter sh +docker-compose -f profile/deploy.yml exec query-exporter sh apk add postgresql-client psql -h $POSTGRES_HOST -U $POSTGRES_USER -d $POSTGRES_DB ``` From 9a7ed2e0a11836c581a67cba40c00933cb907e77 Mon Sep 17 00:00:00 2001 From: Michael Rademaker Date: Wed, 14 Jan 2026 09:26:22 +0100 Subject: [PATCH 3/5] Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- docs/developer-guide/query-exporter.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/docs/developer-guide/query-exporter.md b/docs/developer-guide/query-exporter.md index 60fe43a7..49463ae8 100644 --- a/docs/developer-guide/query-exporter.md +++ b/docs/developer-guide/query-exporter.md @@ -20,8 +20,8 @@ The query-exporter service monitors the OpenRemote PostgreSQL database and expos - `pg_autovacuum_running` - Running autovacuum processes (labels: database, table_schema, table_name, phase) ### Datapoint Query Performance -- `pg_datapoint_query_duration_seconds` - Histogram of execution times for the attribute with most datapoints -- `pg_datapoint_count` - Total number of datapoints for the top attribute +- `pg_datapoint_query_duration_seconds` - Histogram of execution times for the attribute that is automatically identified as having the highest datapoint count +- `pg_datapoint_count` - Total number of datapoints for the attribute that is automatically identified as having the highest datapoint count ### Database Health - `pg_database_size_megabytes` - Total database size in megabytes @@ -94,7 +94,10 @@ scrape_configs: To modify queries or add new metrics: -1. Edit the `config.yaml` file located in the `query-exporter` configuration directory (by default, this is `/deployment/query-exporter/config.yaml` which is mounted as a Docker volume at the container's `/config/config.yaml` path—see your `profile/deploy.yml` for the exact path). +1. Edit the `config.yaml` file in the `query-exporter` configuration directory. + - Default host path: `/deployment/query-exporter/config.yaml` + - Container path (Docker volume mount): `/config/config.yaml` + - For the exact host path in your environment, see the `query-exporter` volume mapping in `profile/deploy.yml`. 2. Restart the service: ```bash docker-compose -f profile/deploy.yml restart query-exporter @@ -130,7 +133,7 @@ If bloat detection queries impact database performance: ### Query Complexity - Bloat detection scans `pg_stats` and `pg_class` catalogs (limited to top 50 results) -- Datapoint performance samples 100 most recent datapoints from the largest attribute +- Datapoint performance uses a sample size of 100 recent datapoints from the largest attribute (configurable) - All queries exclude PostgreSQL system schemas (`pg_%` and `information_schema`) ## Understanding Bloat @@ -153,7 +156,7 @@ The bloat detection queries use these PostgreSQL internal constants: - `20` - Page header size in bytes - `12` - Index header overhead in bytes - `4` - Item pointer size in bytes -- `23` - Tuple header size for PostgreSQL 14+ (Linux) +- `23` - Typical tuple header size for PostgreSQL 14+ on Linux (this value may differ for other PostgreSQL versions or operating systems; verify for your deployment) - `4` - Memory alignment for Linux containers ## References From 855a0675f5258123a5bdfb06ca83ae20af6c8c6d Mon Sep 17 00:00:00 2001 From: Michael Date: Wed, 14 Jan 2026 09:35:21 +0100 Subject: [PATCH 4/5] Update query exporter documentation with Docker networking details --- docs/developer-guide/query-exporter.md | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/docs/developer-guide/query-exporter.md b/docs/developer-guide/query-exporter.md index 49463ae8..0972198a 100644 --- a/docs/developer-guide/query-exporter.md +++ b/docs/developer-guide/query-exporter.md @@ -61,7 +61,7 @@ export INDEX_BLOAT_THRESHOLD=2.0 # 100% index bloat ### Query Intervals - Table bloat queries: Every 5 minutes - Autovacuum queries: Every 30 seconds -- Datapoint performance: Every 60 seconds +- Datapoint performance: Every 60 seconds (samples 100 most recent datapoints) - Database size: Every 5 minutes - Connection/lock stats: Every 30 seconds @@ -86,10 +86,17 @@ Add this scrape configuration to your Prometheus config: scrape_configs: - job_name: 'openremote-postgres' static_configs: - - targets: ['localhost:9560'] + - targets: ['query-exporter:9560'] # Use service name in Docker network + # - targets: ['localhost:9560'] # Use localhost if Prometheus runs on host scrape_interval: 30s ``` +:::note + +When Prometheus runs in the same Docker network as OpenRemote, use the service name `query-exporter:9560`. Only use `localhost:9560` if Prometheus is running directly on the host machine. + +::: + ## Customizing Queries To modify queries or add new metrics: From fdad0143dbe1d45eb121a73a1b5e3c89d00b2fce Mon Sep 17 00:00:00 2001 From: Rich Turner <7072278+richturner@users.noreply.github.com> Date: Tue, 28 Apr 2026 11:14:06 +0100 Subject: [PATCH 5/5] WIP --- docs/user-guide/metrics.md | 259 ++++++++++++++++++++++++++++++++++++- 1 file changed, 257 insertions(+), 2 deletions(-) diff --git a/docs/user-guide/metrics.md b/docs/user-guide/metrics.md index de314480..6f29e36a 100644 --- a/docs/user-guide/metrics.md +++ b/docs/user-guide/metrics.md @@ -27,16 +27,17 @@ graph LR subgraph Docker [Docker Containers] direction TB - Manager["Manager
http://localhost:8404/metrics
- Micrometer with Prometheus Registry
- Runs on own embedded web server port 8404
- OR_METRICS_ENABLED: true/false"]:::greenStyle HAProxy["HA Proxy
http://localhost:8404/metrics
- Uses prometheus-exporter
- Runs on own embedded web server port 8404
- Configured via haproxy.cfg"]:::greenStyle + Manager["Manager
http://localhost:8405/metrics
- Micrometer with Prometheus Registry
- Runs on own embedded web server port 8404
- OR_METRICS_ENABLED: true/false"]:::greenStyle Keycloak["Keycloak
http://localhost:8080/metrics
- Built in prometheus metrics support
- KC_METRICS_ENABLED: true/false
- Do not publicly expose"]:::orangeStyle - PostgreSQL["PostgreSQL
- No metrics at present could use postgresql-exporter"]:::redStyle + PostgreSQL["PostgreSQL
http://localhost:8406/metrics
- Uses separate query-exporter docker container and config"]:::redStyle end end %% Connections PromScrape --> Manager PromScrape --> HAProxy + PromScape --> PostgreSQL CWAgent --> CW CW --> DB @@ -429,3 +430,257 @@ Refer to the website of each container app for details of metrics exposed and th + +## PostgreSQL (via Query Exporter) + + + +The following metrics are exposed by the Query Exporter, which connects directly to the OpenRemote PostgreSQL database to monitor TimescaleDB performance, connection limits, and general database health. The +following is based on the default configuration found in `/deployment/query-exporter/config.yaml`. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Metric nameTypeLabelsDescription
pg_collation_mismatch_countgauge(none)Number of text indexes with collation version mismatches requiring a REINDEX
pg_cache_hit_percentagegauge(none)What percentage of data is being served instantly from RAM versus being slowly read from disk. You want this as high as possible
pg_connections_limitgauge(none)Count of connections max limit
pg_connections_usedgauge(none)Count of connections in use
pg_connections_freegauge(none)Count of connections available
pg_connections_stuckgauge(none)Count of connections with state of idle in transaction
pg_hot_update_percentgaugetable_nameTable percentage of updates that are HOT updates indicates good fillfactor
pg_dead_tuple_percentgaugetable_nameTable ratio of dead tuples to live ones a ratio > 10-20% indicates not aggressive enough autovacuum
pg_last_autovacuum_hoursgaugetable_nameTable hours since last auto vacuum run successfully
pg_last_autoanalyze_hoursgaugetable_nameTable hours since last auto analyze run successfully
pg_db_disk_sizegauge(none)DB size in MB
pg_datapoint_raw_data_sizegauge(none)Asset datapoint table raw uncompressed size in MB
pg_datapoint_indexes_sizegauge(none)Asset datapoint table indexes size in MB
pg_datapoint_toast_sizegauge(none)Asset datapoint TOAST table size in MB
pg_datapoint_disk_sizegauge(none)Asset datapoint table size in MB
pg_datapoint_chunk_countgauge(none)Asset datapoint table hypertable chunk count
pg_datapoint_uncompressed_chunk_countgauge(none)Asset datapoint table hypertable uncompressed chunk count
pg_datapoint_chunks_needing_compressiongauge(none)Asset datapoint table hypertable chunks needing compression count
pg_datapoint_chunk_start_weeksgauge(none)Asset datapoint table oldest hypertable chunk in weeks
pg_datapoint_chunk_end_weeksgauge(none)Asset datapoint table newest hypertable chunk in weeks
pg_datapoint_chunks_not_analyzedgauge(none)Asset datapoint table hypertable chunks not yet analyzed
pg_datapoint_largest_uncompressed_chunkgauge(none)Asset datapoint table largest uncompressed hypertable chunk in MB
pg_datapoint_uncompressed_cache_hit_ratiogauge(none)Asset datapoint table cache hit ratio for uncompressed chunks (Aim for 99%+)
pg_datapoint_uncompressed_blks_read_totalcounter(none)Asset datapoint total physical disk blocks read for uncompressed chunks (Monitor rate with spikes indicate RAM spillover)
pg_datapoint_compression_ratiogauge(none)Asset datapoint table compression ratio
pg_datapoint_querygauge(none)Dummy metric to get typical query time metric
pg_background_errorscounter(none)Count of errors in background worker processes
pg_timescale_job_total_runscounterjob_id | proc_nameTimescaleDB job total runs by job
pg_timescale_job_total_failurescounterjob_id | proc_nameTimescaleDB job total failures by job
pg_timescale_job_last_run_duration_secondsgaugejob_id | proc_nameTimescaleDB job last run duration in seconds
pg_timescale_job_next_start_secondsgaugejob_id | proc_nameSeconds until next scheduled run for each TimescaleDB job
pg_timescale_job_last_run_statusgaugejob_id | proc_name | last_run_statusTimescaleDB job last run status marker
pg_wal_totalcounter(none)Total WAL written since statistics reset in MB
pg_bgwriter_checkpoints_timed_totalcounter(none)Scheduled checkpoints executed
pg_bgwriter_checkpoints_req_totalcounter(none)Requested checkpoints executed
pg_bgwriter_checkpoint_write_time_seconds_totalcounter(none)Total time spent writing checkpoints in seconds
pg_bgwriter_checkpoint_sync_time_seconds_totalcounter(none)Total time spent syncing checkpoints in seconds
pg_table_bloat_countgauge(none)Number of tables where dead tuples > 30% of live rows
pg_index_bloat_countgauge(none)Number of indexes that are larger than 150% of table size