consul-replicate sync empty data during raft leader election in the master DC

Hello.

We ran into very strange issue with consul-replicate - three times.

    consul-replicate v0.3.1 (172f34b) - configured with **max_stale = "5m"**
    consul v0.7.5

During outage and raft leader election in the main data center, consul-replicate in the followers data centers synchronized blank or incomplete data from the main dc.

As a result, huge amount of data is erased from the prefixes we were replicating in the followers data centers. We experienced this with a prefix holding ~150000 subkeys/records.

**UPDATE**: My latest tests are showing that this issue can NOT be recreated if I configure consul-replicate **without** max_stale.

I suspect there might be something like race condition, where consul-replicate prefix watch is attached to a main dc follower that did not catch up completely with the master yet and whose kv db was empty resulting in triggering a kv prefix watch event with blank data. In this case I am not really sure how it get past the following check:

    // Get the prefix data
    view, ok := r.get(prefix)
    if !ok {
        log.Printf("[INFO] (runner) no data for %q", prefix.Dependency)
        doneCh <- struct{}{}
        return
    }

Here is a quick timeline what happened + logs

This was  the last successful replication prior main dc outage

    20180313T084044.363696: 2018/03/13 08:40:44.363681 [INFO] (runner) replicated 0 updates, 23 deletes

Main data center goes down and consul-replicate in follower data center start to return errors. Some of the logs skipped as they are identical.

    20180313T084122.659085: 2018/03/13 08:41:22.659073 [WARN] (view) kv.list(pub@main-dc): Unexpected response code: 500 (retry attempt 1 after "10s")

Retry has been successful and some data was replicated:

    20180313T084251.126752: 2018/03/13 08:42:51.126734 [INFO] (runner) quiescence timers starting
    20180313T084251.276830: 2018/03/13 08:42:51.276821 [INFO] (runner) quiescence minTimer fired
    20180313T084251.276842: 2018/03/13 08:42:51.276827 [INFO] (runner) running
    20180313T084252.059906: 2018/03/13 08:42:52.059891 [INFO] (runner) replicated 49 updates, 53 deletes
    20180313T084259.630092: 2018/03/13 08:42:59.630080 [INFO] (runner) quiescence timers starting
    20180313T084259.780182: 2018/03/13 08:42:59.780168 [INFO] (runner) quiescence minTimer fired
    20180313T084259.780193: 2018/03/13 08:42:59.780179 [INFO] (runner) running

Main data center goes down again:

    20180313T084409.232366: 2018/03/13 08:44:09.232265 [WARN] (view) kv.list(pub@main-dc): Unexpected response code: 500 (retry attempt 4 after "1m20s")
    20180313T085144.446896: 2018/03/13 08:51:44.446804 [WARN] (view) kv.list(pub@main-us): Unexpected response code: 500 (retry attempt 6 after "5m20s")

The very next log line is showing how consul-replicate synchronized empty data from the main data center resulting all keys from the pub prefix to be deleted

    20180313T090040.647493: 2018/03/13 09:00:40.647411 [INFO] (runner) replicated 0 updates, 127719 deletes

Also after the outage in the main data center was resolved, it refused to synchronize proper data from the main dc KV store again due to the following code:

    // Ignore if the modify index is old
        if pair.ModifyIndex <= status.LastReplicated {
            log.Printf("[DEBUG] (runner) skipping because %q is already "+
                "replicated", key)
            continue
        }

To force re-synchronization I had to erase recursively, all entries under the following prefix:

    consul kv delete -recurse service/consul-replicate/statuses

Not sure if adjusting the following options could  have prevented (sync blank data) this from happening. I am still unsure how to read those timers exactly.  Did they mean "force replication each N seconds" or I am getting this wrong. How they are related to the cluster consistent state?

    # This is the quiescence timers; it defines the minimum and maximum amount of
    # time to wait for the cluster to reach a consistent state before rendering a
    # replicating. This is useful to enable in systems that have a lot of flapping,
    # because it will reduce the the number of times a replication occurs.
    wait {
      min = "5s"
      max = "10s"
    }

I am aware that the consul version we are using is a bit old, but I suspect that the issue is more to be related consul-replicate.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consul-replicate sync empty data during raft leader election in the master DC #82

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

consul-replicate sync empty data during raft leader election in the master DC #82

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions