Skip to content

[BUG]:master failed to restart if local journal is lost #532

@shuoranliu

Description

@shuoranliu

Describe the bug
In a 3-master-node cluster, if we delete one master pod and its local meta and journal, it failed to restart with the following raft panic. So it will never join the raft group again and catch up with other member's commit index.

SERVER_TYPE: master, ACTION_TYPE: start
POD_IP: 
POD_NAMESPACE: 
POD_CLUSTER_DOMAIN: 
Loading environment variables from /app/curvine/conf/curvine-env.sh
Starting master service...
datetime: 2025-12-25 06:12:55.498, git version: 0e69244, args: ServerArgs {
    service: "master",
    conf: "/app/curvine/conf/curvine-cluster.toml",
}
Applying master hostname from env: 'cv-master-0.cv-master.curvine.svc.cluster.local'
2025-12-25 06:12:55.696 Panic occurred at /root/.cargo/registry/src/mirrors.aliyun.com-0671735e7cc7f5e7/raft-0.7.0/src/raft_log.rs:292: to_commit 1 is out of range [last_index 0], raft_id: 1, backtrace <disabled>

To Reproduce

  1. Deploy a 3-master-node cluster.
  2. Kill one master pod, and delete its data disk or just delete its local meta and journal.
  3. Restart this pod with the same master hostname.

Expected behavior
The master pod restarts successfully, and catch up with the raft commit index.

Screenshots
If applicable, add screenshots to help explain your problem.

OS Version (please complete the following information):

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions