Skip to content

feat(api): add /synced api to check changefeed synchronization status#2602

Merged
ti-chi-bot[bot] merged 28 commits into
pingcap:masterfrom
wlwilliamx:feat/support-synced-api
Oct 15, 2025
Merged

feat(api): add /synced api to check changefeed synchronization status#2602
ti-chi-bot[bot] merged 28 commits into
pingcap:masterfrom
wlwilliamx:feat/support-synced-api

Conversation

@wlwilliamx
Copy link
Copy Markdown
Collaborator

@wlwilliamx wlwilliamx commented Oct 11, 2025

What problem does this PR solve?

Issue Number: close #1630

What is changed and how it works?

This PR introduces a new API endpoint GET /api/v2/changefeeds/{changefeed-id}/synced to allow users to reliably check if a changefeed has finished replicating all available upstream data.

The core of this feature is a new, more robust definition of "synced" that handles idle sources and stalled upstream regions, providing a more accurate status than just observing checkpoint lag.

Detailed Changes

1. New API Endpoint and Sync Logic

  • api/v2/changefeed.go: The primary logic for the feature resides in the new synced function.
    • It fetches the current time from PD to serve as a consistent reference point.
    • It retrieves the changefeed's CheckpointTs, LastSyncedTs, and LogCoordinatorResolvedTs.
    • It implements a three-part logic to determine the sync state:
      1. Strictly Synced: Returns synced: true if (now - LastSyncedTs) is greater than SyncedCheckInterval AND (now - CheckpointTs) is less than CheckpointInterval. This means no new data has been received for a while, but the sink is still actively checkpointing.
      2. Potentially Stalled: If (now - LastSyncedTs) is large but (now - CheckpointTs) is also large, it checks for an upstream issue. It compares LogCoordinatorResolvedTs and CheckpointTs.
        • If the gap is small, it suggests the source resolved-ts is not advancing and returns an info message telling the user to check PD/TiKV health.
        • If the gap is large, it indicates a genuine replication lag.
      3. Not Synced: In all other cases, such as when LastSyncedTs is very recent, it returns synced: false.

2. LastSyncedTs Propagation

To know when the last transaction was applied, a new timestamp, LastSyncedTs, is now tracked and propagated from the dispatcher up to the coordinator.

  • Dispatcher:
    • TableProgress now tracks lastSyncedTs by recording the commit timestamp of each flushed event. To prevent backward movement, it only stores the maximum value seen.
    • BasicDispatcher includes LastSyncedTs in its heartbeat information.
  • Maintainer:
    • The Maintainer receives heartbeats from all dispatchers and aggregates the LastSyncedTs. It updates its global watermark with the maximum LastSyncedTs received.
    • The MaintainerStatus now includes the aggregated LastSyncedTs.
  • Protobuf: The Watermark and MaintainerStatus messages in heartbeatpb have been updated to include the lastSyncedTs field.
  • Coordinator: The coordinator receives the MaintainerStatus and stores the LastSyncedTs in the ChangeFeedStatus struct.

3. On-Demand LogCoordinatorResolvedTs Retrieval

To get the most up-to-date puller progress, a new request-response mechanism has been implemented.

  • New Messages:
    • Defined LogCoordinatorResolvedTsRequest and LogCoordinatorResolvedTsResponse in heartbeat.proto.
    • Registered these new message types in the messaging service.
  • Coordinator/Controller:
    • The Controller now has a RequestResolvedTsFromLogCoordinator method. When called by the API handler, it broadcasts a request to all alive log coordinator nodes.
    • It then waits for a short period for a response. This ensures the API provides a fresh resolvedTs value rather than a potentially stale, periodically reported one.
  • Log Coordinator:
    • The logCoordinator module now handles the LogCoordinatorResolvedTsRequest.
    • Upon receiving a request, it immediately responds with its current minimum puller resolved timestamp for the given changefeed.

4. State Management and Struct Changes

  • coordinator/changefeed/changefeed.go: The Changefeed struct now includes logCoordinatorResolvedTs to cache the value received from the log coordinator.
  • coordinator/changefeed/changefeed_db.go: Added new methods to update and retrieve logCoordinatorResolvedTs and changefeedID by name.
  • pkg/config/changefeed.go: The ChangeFeedStatus struct was updated to include LastSyncedTs and LogCoordinatorResolvedTs (these are transient and not persisted to etcd).
  • heartbeatpb/watermark_util.go: A new Update method was added to handle the aggregation logic for the watermark, using min for checkpoint/resolved TS and max for LastSyncedTs.

5. Testing

  • Added two new integration test suites: synced_status and synced_status_with_redo.
  • These tests validate the API's behavior under various conditions:
    • Normal operation where the changefeed reaches a synced state.
    • When the PD is unavailable.
    • When a TiKV node is unavailable.
  • The test scripts assert the correctness of the synced status and the info message returned by the API in each scenario.
  • Removed a now-redundant test case that used failpoints to block checkpoints, as the new tests provide more comprehensive coverage of unhealthy cluster scenarios.

Check List

Tests

  • Unit test
  • Integration test, add synced_status and synced_status_with_redo
  • Manual test
    curl -X GET http://127.0.0.1:8300/api/v2/changefeeds/test/synced

Questions

Will it cause performance regression or break compatibility?

None

Do you need to update user documentation, design documentation or monitoring documentation?

None

Release note

None

@ti-chi-bot ti-chi-bot Bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/needs-triage-completed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 11, 2025
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @wlwilliamx, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a crucial new API endpoint that enables users to ascertain the synchronization status of a changefeed. This feature is vital for operations requiring confirmed data consistency, such as database migrations or disaster recovery, by providing a clear, programmatic signal when a changefeed has caught up and no new data is actively being processed.

Highlights

  • New Synchronization API: Introduced a new API endpoint, GET /api/v2/changefeeds/{changefeed_id}/synced, allowing users to programmatically check if a changefeed has completed data synchronization.
  • Comprehensive Sync Logic: The API determines synchronization status by evaluating key timestamps: CheckpointTs, LastSyncedTs, PullerResolvedTs, and the current PD time, providing robust checks even when upstream data is inactive.
  • Enhanced Timestamp Propagation: Implemented mechanisms to propagate LastSyncedTs (last commit timestamp sent to sink) and PullerResolvedTs (resolved timestamp from the puller module) across various internal components (dispatcher, maintainer, log coordinator, controller) to support the new sync logic.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new API endpoint /api/v2/changefeeds/{changefeed_id}/synced to check if a changefeed is synchronized. The implementation involves propagating two new timestamps, LastSyncedTs and PullerResolvedTs, through the system. While the overall approach is sound, I've found a couple of significant issues in the implementation. The logic in the new API handler for determining the 'synced' status is misleading in some cases, and the aggregation of LastSyncedTs in the maintainer is incorrect, which could lead to stale data. I've provided detailed comments and suggestions to address these issues.

Comment thread api/v2/changefeed.go
Comment thread maintainer/maintainer.go
@wlwilliamx
Copy link
Copy Markdown
Collaborator Author

/retest

@wlwilliamx
Copy link
Copy Markdown
Collaborator Author

/test all

@wlwilliamx
Copy link
Copy Markdown
Collaborator Author

/CC @hongyunyan

@ti-chi-bot ti-chi-bot Bot requested a review from hongyunyan October 11, 2025 11:59
@ti-chi-bot ti-chi-bot Bot added the lgtm label Oct 11, 2025
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Oct 11, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-10-11 12:45:02.048313936 +0000 UTC m=+530691.079413264: ☑️ agreed by flowbehappy.

@ti-chi-bot ti-chi-bot Bot added the approved label Oct 11, 2025
@hongyunyan
Copy link
Copy Markdown
Collaborator

/hold

@ti-chi-bot ti-chi-bot Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 11, 2025
Comment thread coordinator/changefeed/changefeed.go Outdated
Comment thread coordinator/changefeed/changefeed.go Outdated
Comment thread coordinator/changefeed/changefeed_db.go Outdated
Comment thread pkg/config/changefeed.go Outdated
Comment thread pkg/etcd/etcd.go Outdated
Comment thread downstreamadapter/dispatchermanager/dispatcher_manager.go Outdated
Comment thread coordinator/controller.go Outdated
@wlwilliamx
Copy link
Copy Markdown
Collaborator Author

/test all

@wlwilliamx
Copy link
Copy Markdown
Collaborator Author

/test all

Comment thread logservice/coordinator/coordinator.go Outdated
Comment thread logservice/coordinator/coordinator.go Outdated
@wlwilliamx
Copy link
Copy Markdown
Collaborator Author

/test all

@wlwilliamx
Copy link
Copy Markdown
Collaborator Author

/test all

@wlwilliamx
Copy link
Copy Markdown
Collaborator Author

/test all

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Oct 15, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: flowbehappy, hongyunyan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hongyunyan hongyunyan removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 15, 2025
@hongyunyan
Copy link
Copy Markdown
Collaborator

/retest

@ti-chi-bot ti-chi-bot Bot merged commit 5dcf3bf into pingcap:master Oct 15, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a /synced API to verify changefeed data synchronization status

3 participants