diff --git a/vendor.conf b/vendor.conf index 484bbaefe9..4e81c7bb8d 100644 --- a/vendor.conf +++ b/vendor.conf @@ -19,8 +19,8 @@ github.com/grpc-ecosystem/go-grpc-prometheus 6b7015e65d366bf3f19b2b2a000a831940f github.com/docker/go-metrics d466d4f6fd960e01820085bd7e1a24426ee7ef18 # etcd/raft -github.com/coreos/etcd v3.2.1 -github.com/coreos/go-systemd v15 +github.com/coreos/etcd v3.3.9 +github.com/coreos/go-systemd v17 github.com/coreos/pkg v3 github.com/prometheus/client_golang 52437c81da6b127a9925d17eb3a382a2e5fd395e github.com/prometheus/client_model fa8ad6fec33561be4280a8f0514318c79d7f6cb6 diff --git a/vendor/github.com/coreos/etcd/README.md b/vendor/github.com/coreos/etcd/README.md index 8ab28492c7..2b559014e5 100644 --- a/vendor/github.com/coreos/etcd/README.md +++ b/vendor/github.com/coreos/etcd/README.md @@ -1,9 +1,12 @@ # etcd -[![Go Report Card](https://goreportcard.com/badge/github.com/coreos/etcd)](https://goreportcard.com/report/github.com/coreos/etcd) -[![Build Status](https://travis-ci.org/coreos/etcd.svg?branch=master)](https://travis-ci.org/coreos/etcd) -[![Build Status](https://semaphoreci.com/api/v1/coreos/etcd/branches/master/shields_badge.svg)](https://semaphoreci.com/coreos/etcd) -[![Docker Repository on Quay.io](https://quay.io/repository/coreos/etcd-git/status "Docker Repository on Quay.io")](https://quay.io/repository/coreos/etcd-git) +[![Go Report Card](https://goreportcard.com/badge/github.com/coreos/etcd?style=flat-square)](https://goreportcard.com/report/github.com/coreos/etcd) +[![Coverage](https://codecov.io/gh/coreos/etcd/branch/master/graph/badge.svg)](https://codecov.io/gh/coreos/etcd) +[![Build Status Travis](https://img.shields.io/travis/coreos/etcdlabs.svg?style=flat-square&&branch=master)](https://travis-ci.org/coreos/etcd) +[![Build Status Semaphore](https://semaphoreci.com/api/v1/coreos/etcd/branches/master/shields_badge.svg)](https://semaphoreci.com/coreos/etcd) +[![Godoc](http://img.shields.io/badge/go-documentation-blue.svg?style=flat-square)](https://godoc.org/github.com/coreos/etcd) +[![Releases](https://img.shields.io/github/release/coreos/etcd/all.svg?style=flat-square)](https://github.com/coreos/etcd/releases) +[![LICENSE](https://img.shields.io/github/license/coreos/etcd.svg?style=flat-square)](https://github.com/coreos/etcd/blob/master/LICENSE) **Note**: The `master` branch may be in an *unstable or even broken state* during development. Please use [releases][github-release] instead of the `master` branch in order to get stable binaries. @@ -33,13 +36,21 @@ See [etcdctl][etcdctl] for a simple command line client. [etcdctl]: https://github.com/coreos/etcd/tree/master/etcdctl [etcd-tests]: http://dash.etcd.io +## Community meetings + +etcd contributors and maintainers have bi-weekly meetings at 11:00 AM (USA Pacific) on Tuesdays. There is an [iCalendar][rfc5545] format for the meetings [here](meeting.ics). Anyone is welcome to join via [Zoom][zoom] or audio-only: +1 669 900 6833. An initial agenda will be posted to the [shared Google docs][shared-meeting-notes] a day before each meeting, and everyone is welcome to suggest additional topics or other agendas. + +[rfc5545]: https://tools.ietf.org/html/rfc5545 +[zoom]: https://coreos.zoom.us/j/854793406 +[shared-meeting-notes]: https://docs.google.com/document/d/1DbVXOHvd9scFsSmL2oNg4YGOHJdXqtx583DmeVWrB_M/edit# + ## Getting started ### Getting etcd The easiest way to get etcd is to use one of the pre-built release binaries which are available for OSX, Linux, Windows, [rkt][rkt], and Docker. Instructions for using these binaries are on the [GitHub releases page][github-release]. -For those wanting to try the very latest version, [build the latest version of etcd][dl-build] from the `master` branch. This first needs [*Go*](https://golang.org/) installed (version 1.8+ is required). All development occurs on `master`, including new features and bug fixes. Bug fixes are first targeted at `master` and subsequently ported to release branches, as described in the [branch management][branch-management] guide. +For those wanting to try the very latest version, [build the latest version of etcd][dl-build] from the `master` branch. This first needs [*Go*](https://golang.org/) installed (version 1.9+ is required). All development occurs on `master`, including new features and bug fixes. Bug fixes are first targeted at `master` and subsequently ported to release branches, as described in the [branch management][branch-management] guide. [rkt]: https://github.com/rkt/rkt/releases/ [github-release]: https://github.com/coreos/etcd/releases/ @@ -48,7 +59,22 @@ For those wanting to try the very latest version, [build the latest version of e ### Running etcd -First start a single-member cluster of etcd: +First start a single-member cluster of etcd. + +If etcd is installed using the [pre-built release binaries][github-release], run it from the installation location as below: + +```sh +/tmp/etcd-download-test/etcd +``` +The etcd command can be simply run as such if it is moved to the system path as below: + +```sh +mv /tmp/etcd-download-test/etcd /usr/locale/bin/ + +etcd +``` + +If etcd is [build from the master branch][dl-build], run it as below: ```sh ./bin/etcd @@ -87,7 +113,7 @@ Our [Procfile script](./Procfile) will set up a local example cluster. Start it goreman start ``` -This will bring up 3 etcd members `infra1`, `infra2` and `infra3` and etcd proxy `proxy`, which runs locally and composes a cluster. +This will bring up 3 etcd members `infra1`, `infra2` and `infra3` and etcd `grpc-proxy`, which runs locally and composes a cluster. Every cluster member and proxy accepts key value reads and key value writes. diff --git a/vendor/github.com/coreos/etcd/pkg/fileutil/lock_windows.go b/vendor/github.com/coreos/etcd/pkg/fileutil/lock_windows.go index 8698f4a8d1..b1817230a3 100644 --- a/vendor/github.com/coreos/etcd/pkg/fileutil/lock_windows.go +++ b/vendor/github.com/coreos/etcd/pkg/fileutil/lock_windows.go @@ -121,5 +121,5 @@ func lockFileEx(h syscall.Handle, flags, locklow, lockhigh uint32, ol *syscall.O err = syscall.EINVAL } } - return + return err } diff --git a/vendor/github.com/coreos/etcd/pkg/fileutil/preallocate_darwin.go b/vendor/github.com/coreos/etcd/pkg/fileutil/preallocate_darwin.go index 1ed09c560f..5a6dccfa79 100644 --- a/vendor/github.com/coreos/etcd/pkg/fileutil/preallocate_darwin.go +++ b/vendor/github.com/coreos/etcd/pkg/fileutil/preallocate_darwin.go @@ -30,6 +30,8 @@ func preallocExtend(f *os.File, sizeInBytes int64) error { } func preallocFixed(f *os.File, sizeInBytes int64) error { + // allocate all requested space or no space at all + // TODO: allocate contiguous space on disk with F_ALLOCATECONTIG flag fstore := &syscall.Fstore_t{ Flags: syscall.F_ALLOCATEALL, Posmode: syscall.F_PEOFPOSMODE, @@ -39,5 +41,25 @@ func preallocFixed(f *os.File, sizeInBytes int64) error { if errno == 0 || errno == syscall.ENOTSUP { return nil } + + // wrong argument to fallocate syscall + if errno == syscall.EINVAL { + // filesystem "st_blocks" are allocated in the units of + // "Allocation Block Size" (run "diskutil info /" command) + var stat syscall.Stat_t + syscall.Fstat(int(f.Fd()), &stat) + + // syscall.Statfs_t.Bsize is "optimal transfer block size" + // and contains matching 4096 value when latest OS X kernel + // supports 4,096 KB filesystem block size + var statfs syscall.Statfs_t + syscall.Fstatfs(int(f.Fd()), &statfs) + blockSize := int64(statfs.Bsize) + + if stat.Blocks*blockSize >= sizeInBytes { + // enough blocks are already allocated + return nil + } + } return errno } diff --git a/vendor/github.com/coreos/etcd/raft/README.md b/vendor/github.com/coreos/etcd/raft/README.md index f485b83977..fde22b1651 100644 --- a/vendor/github.com/coreos/etcd/raft/README.md +++ b/vendor/github.com/coreos/etcd/raft/README.md @@ -25,12 +25,12 @@ This raft implementation is a full feature implementation of Raft protocol. Feat - Membership changes - Leadership transfer extension - Efficient linearizable read-only queries served by both the leader and followers - - leader checks with quorum and bypasses Raft log before processing read-only queries - - followers asks leader to get a safe read index before processing read-only queries + - leader checks with quorum and bypasses Raft log before processing read-only queries + - followers asks leader to get a safe read index before processing read-only queries - More efficient lease-based linearizable read-only queries served by both the leader and followers - - leader bypasses Raft log and processing read-only queries locally - - followers asks leader to get a safe read index before processing read-only queries - - this approach relies on the clock of the all the machines in raft group + - leader bypasses Raft log and processing read-only queries locally + - followers asks leader to get a safe read index before processing read-only queries + - this approach relies on the clock of the all the machines in raft group This raft implementation also includes a few optional enhancements: @@ -112,7 +112,7 @@ After creating a Node, the user has a few responsibilities: First, read from the Node.Ready() channel and process the updates it contains. These steps may be performed in parallel, except as noted in step 2. -1. Write HardState, Entries, and Snapshot to persistent storage if they are not empty. Note that when writing an Entry with Index i, any previously-persisted entries with Index >= i must be discarded. +1. Write Entries, HardState and Snapshot to persistent storage in order, i.e. Entries first, then HardState and Snapshot if they are not empty. If persistent storage supports atomic writes then all of them can be written together. Note that when writing an Entry with Index i, any previously-persisted entries with Index >= i must be discarded. 2. Send all Messages to the nodes named in the To field. It is important that no messages be sent until the latest HardState has been persisted to disk, and all Entries written by any previous Ready batch (Messages may be sent while entries from the same batch are being persisted). To reduce the I/O latency, an optimization can be applied to make leader write to disk in parallel with its followers (as explained at section 10.2.1 in Raft thesis). If any Message has type MsgSnap, call Node.ReportSnapshot() after it has been sent (these messages may be large). Note: Marshalling messages is not thread-safe; it is important to make sure that no new entries are persisted while marshalling. The easiest way to achieve this is to serialise the messages directly inside the main raft loop. diff --git a/vendor/github.com/coreos/etcd/raft/node.go b/vendor/github.com/coreos/etcd/raft/node.go index 5da1c1193b..33a9db8400 100644 --- a/vendor/github.com/coreos/etcd/raft/node.go +++ b/vendor/github.com/coreos/etcd/raft/node.go @@ -15,10 +15,10 @@ package raft import ( + "context" "errors" pb "github.com/coreos/etcd/raft/raftpb" - "golang.org/x/net/context" ) type SnapshotStatus int @@ -319,7 +319,7 @@ func (n *node) run(r *raft) { r.Step(m) case m := <-n.recvc: // filter out response message from unknown From. - if _, ok := r.prs[m.From]; ok || !IsResponseMsg(m.Type) { + if pr := r.getProgress(m.From); pr != nil || !IsResponseMsg(m.Type) { r.Step(m) // raft never returns an error } case cc := <-n.confc: @@ -334,6 +334,8 @@ func (n *node) run(r *raft) { switch cc.Type { case pb.ConfChangeAddNode: r.addNode(cc.NodeID) + case pb.ConfChangeAddLearnerNode: + r.addLearner(cc.NodeID) case pb.ConfChangeRemoveNode: // block incoming proposal when local node is // removed diff --git a/vendor/github.com/coreos/etcd/raft/progress.go b/vendor/github.com/coreos/etcd/raft/progress.go index 77c7b52efe..ef3787db65 100644 --- a/vendor/github.com/coreos/etcd/raft/progress.go +++ b/vendor/github.com/coreos/etcd/raft/progress.go @@ -48,6 +48,7 @@ type Progress struct { // When in ProgressStateSnapshot, leader should have sent out snapshot // before and stops sending any replication message. State ProgressStateType + // Paused is used in ProgressStateProbe. // When Paused is true, raft should pause sending replication message to this peer. Paused bool @@ -76,6 +77,9 @@ type Progress struct { // be freed by calling inflights.freeTo with the index of the last // received entry. ins *inflights + + // IsLearner is true if this progress is tracked for a learner. + IsLearner bool } func (pr *Progress) resetState(state ProgressStateType) { @@ -243,7 +247,8 @@ func (in *inflights) freeTo(to uint64) { return } - i, idx := 0, in.start + idx := in.start + var i int for i = 0; i < in.count; i++ { if to < in.buffer[idx] { // found the first large inflight break diff --git a/vendor/github.com/coreos/etcd/raft/raft.go b/vendor/github.com/coreos/etcd/raft/raft.go index 29f2039820..b4c0f0248c 100644 --- a/vendor/github.com/coreos/etcd/raft/raft.go +++ b/vendor/github.com/coreos/etcd/raft/raft.go @@ -116,6 +116,10 @@ type Config struct { // used for testing right now. peers []uint64 + // learners contains the IDs of all leaner nodes (including self if the local node is a leaner) in the raft cluster. + // learners only receives entries from the leader node. It does not vote or promote itself. + learners []uint64 + // ElectionTick is the number of Node.Tick invocations that must pass between // elections. That is, if a follower does not receive any message from the // leader of current term before ElectionTick has elapsed, it will become @@ -171,11 +175,22 @@ type Config struct { // If the clock drift is unbounded, leader might keep the lease longer than it // should (clock can move backward/pause without any bound). ReadIndex is not safe // in that case. + // CheckQuorum MUST be enabled if ReadOnlyOption is ReadOnlyLeaseBased. ReadOnlyOption ReadOnlyOption // Logger is the logger used for raft log. For multinode which can host // multiple raft group, each raft group can have its own logger Logger Logger + + // DisableProposalForwarding set to true means that followers will drop + // proposals, rather than forwarding them to the leader. One use case for + // this feature would be in a situation where the Raft leader is used to + // compute the data of a proposal, for example, adding a timestamp from a + // hybrid logical clock to data in a monotonically increasing way. Forwarding + // should be disabled to prevent a follower with an innaccurate hybrid + // logical clock from assigning the timestamp and then forwarding the data + // to the leader. + DisableProposalForwarding bool } func (c *Config) validate() error { @@ -203,6 +218,10 @@ func (c *Config) validate() error { c.Logger = raftLogger } + if c.ReadOnlyOption == ReadOnlyLeaseBased && !c.CheckQuorum { + return errors.New("CheckQuorum must be enabled when ReadOnlyOption is ReadOnlyLeaseBased") + } + return nil } @@ -220,9 +239,13 @@ type raft struct { maxInflight int maxMsgSize uint64 prs map[uint64]*Progress + learnerPrs map[uint64]*Progress state StateType + // isLearner is true if the local raft node is a learner. + isLearner bool + votes map[uint64]bool msgs []pb.Message @@ -256,6 +279,7 @@ type raft struct { // [electiontimeout, 2 * electiontimeout - 1]. It gets reset // when raft changes its state to follower or candidate. randomizedElectionTimeout int + disableProposalForwarding bool tick func() step stepFunc @@ -273,32 +297,47 @@ func newRaft(c *Config) *raft { panic(err) // TODO(bdarnell) } peers := c.peers - if len(cs.Nodes) > 0 { - if len(peers) > 0 { + learners := c.learners + if len(cs.Nodes) > 0 || len(cs.Learners) > 0 { + if len(peers) > 0 || len(learners) > 0 { // TODO(bdarnell): the peers argument is always nil except in // tests; the argument should be removed and these tests should be // updated to specify their nodes through a snapshot. - panic("cannot specify both newRaft(peers) and ConfState.Nodes)") + panic("cannot specify both newRaft(peers, learners) and ConfState.(Nodes, Learners)") } peers = cs.Nodes + learners = cs.Learners } r := &raft{ - id: c.ID, - lead: None, - raftLog: raftlog, - maxMsgSize: c.MaxSizePerMsg, - maxInflight: c.MaxInflightMsgs, - prs: make(map[uint64]*Progress), - electionTimeout: c.ElectionTick, - heartbeatTimeout: c.HeartbeatTick, - logger: c.Logger, - checkQuorum: c.CheckQuorum, - preVote: c.PreVote, - readOnly: newReadOnly(c.ReadOnlyOption), + id: c.ID, + lead: None, + isLearner: false, + raftLog: raftlog, + maxMsgSize: c.MaxSizePerMsg, + maxInflight: c.MaxInflightMsgs, + prs: make(map[uint64]*Progress), + learnerPrs: make(map[uint64]*Progress), + electionTimeout: c.ElectionTick, + heartbeatTimeout: c.HeartbeatTick, + logger: c.Logger, + checkQuorum: c.CheckQuorum, + preVote: c.PreVote, + readOnly: newReadOnly(c.ReadOnlyOption), + disableProposalForwarding: c.DisableProposalForwarding, } for _, p := range peers { r.prs[p] = &Progress{Next: 1, ins: newInflights(r.maxInflight)} } + for _, p := range learners { + if _, ok := r.prs[p]; ok { + panic(fmt.Sprintf("node %x is in both learner and peer list", p)) + } + r.learnerPrs[p] = &Progress{Next: 1, ins: newInflights(r.maxInflight), IsLearner: true} + if r.id == p { + r.isLearner = true + } + } + if !isHardStateEqual(hs, emptyState) { r.loadState(hs) } @@ -332,10 +371,13 @@ func (r *raft) hardState() pb.HardState { func (r *raft) quorum() int { return len(r.prs)/2 + 1 } func (r *raft) nodes() []uint64 { - nodes := make([]uint64, 0, len(r.prs)) + nodes := make([]uint64, 0, len(r.prs)+len(r.learnerPrs)) for id := range r.prs { nodes = append(nodes, id) } + for id := range r.learnerPrs { + nodes = append(nodes, id) + } sort.Sort(uint64Slice(nodes)) return nodes } @@ -343,10 +385,20 @@ func (r *raft) nodes() []uint64 { // send persists state to stable storage and then sends to its mailbox. func (r *raft) send(m pb.Message) { m.From = r.id - if m.Type == pb.MsgVote || m.Type == pb.MsgPreVote { + if m.Type == pb.MsgVote || m.Type == pb.MsgVoteResp || m.Type == pb.MsgPreVote || m.Type == pb.MsgPreVoteResp { if m.Term == 0 { - // PreVote RPCs are sent at a term other than our actual term, so the code - // that sends these messages is responsible for setting the term. + // All {pre-,}campaign messages need to have the term set when + // sending. + // - MsgVote: m.Term is the term the node is campaigning for, + // non-zero as we increment the term when campaigning. + // - MsgVoteResp: m.Term is the new r.Term if the MsgVote was + // granted, non-zero for the same reason MsgVote is + // - MsgPreVote: m.Term is the term the node will campaign, + // non-zero as we use m.Term to indicate the next term we'll be + // campaigning for + // - MsgPreVoteResp: m.Term is the term received in the original + // MsgPreVote if the pre-vote was granted, non-zero for the + // same reasons MsgPreVote is panic(fmt.Sprintf("term should be set when sending %s", m.Type)) } } else { @@ -364,9 +416,17 @@ func (r *raft) send(m pb.Message) { r.msgs = append(r.msgs, m) } +func (r *raft) getProgress(id uint64) *Progress { + if pr, ok := r.prs[id]; ok { + return pr + } + + return r.learnerPrs[id] +} + // sendAppend sends RPC, with entries to the given peer. func (r *raft) sendAppend(to uint64) { - pr := r.prs[to] + pr := r.getProgress(to) if pr.IsPaused() { return } @@ -431,7 +491,7 @@ func (r *raft) sendHeartbeat(to uint64, ctx []byte) { // or it might not have all the committed entries. // The leader MUST NOT forward the follower's commit to // an unmatched index. - commit := min(r.prs[to].Match, r.raftLog.committed) + commit := min(r.getProgress(to).Match, r.raftLog.committed) m := pb.Message{ To: to, Type: pb.MsgHeartbeat, @@ -442,15 +502,26 @@ func (r *raft) sendHeartbeat(to uint64, ctx []byte) { r.send(m) } +func (r *raft) forEachProgress(f func(id uint64, pr *Progress)) { + for id, pr := range r.prs { + f(id, pr) + } + + for id, pr := range r.learnerPrs { + f(id, pr) + } +} + // bcastAppend sends RPC, with entries to all peers that are not up-to-date // according to the progress recorded in r.prs. func (r *raft) bcastAppend() { - for id := range r.prs { + r.forEachProgress(func(id uint64, _ *Progress) { if id == r.id { - continue + return } + r.sendAppend(id) - } + }) } // bcastHeartbeat sends RPC, without entries to all the peers. @@ -464,12 +535,12 @@ func (r *raft) bcastHeartbeat() { } func (r *raft) bcastHeartbeatWithCtx(ctx []byte) { - for id := range r.prs { + r.forEachProgress(func(id uint64, _ *Progress) { if id == r.id { - continue + return } r.sendHeartbeat(id, ctx) - } + }) } // maybeCommit attempts to advance the commit index. Returns true if @@ -478,8 +549,8 @@ func (r *raft) bcastHeartbeatWithCtx(ctx []byte) { func (r *raft) maybeCommit() bool { // TODO(bmizerany): optimize.. Currently naive mis := make(uint64Slice, 0, len(r.prs)) - for id := range r.prs { - mis = append(mis, r.prs[id].Match) + for _, p := range r.prs { + mis = append(mis, p.Match) } sort.Sort(sort.Reverse(mis)) mci := mis[r.quorum()-1] @@ -500,12 +571,13 @@ func (r *raft) reset(term uint64) { r.abortLeaderTransfer() r.votes = make(map[uint64]bool) - for id := range r.prs { - r.prs[id] = &Progress{Next: r.raftLog.lastIndex() + 1, ins: newInflights(r.maxInflight)} + r.forEachProgress(func(id uint64, pr *Progress) { + *pr = Progress{Next: r.raftLog.lastIndex() + 1, ins: newInflights(r.maxInflight), IsLearner: pr.IsLearner} if id == r.id { - r.prs[id].Match = r.raftLog.lastIndex() + pr.Match = r.raftLog.lastIndex() } - } + }) + r.pendingConf = false r.readOnly = newReadOnly(r.readOnly.option) } @@ -517,7 +589,7 @@ func (r *raft) appendEntry(es ...pb.Entry) { es[i].Index = li + 1 + uint64(i) } r.raftLog.append(es...) - r.prs[r.id].maybeUpdate(r.raftLog.lastIndex()) + r.getProgress(r.id).maybeUpdate(r.raftLog.lastIndex()) // Regardless of maybeCommit's return, our caller will call bcastAppend. r.maybeCommit() } @@ -589,6 +661,7 @@ func (r *raft) becomePreCandidate() { // but doesn't change anything else. In particular it does not increase // r.Term or change r.Vote. r.step = stepCandidate + r.votes = make(map[uint64]bool) r.tick = r.tickElection r.state = StatePreCandidate r.logger.Infof("%x became pre-candidate at term %d", r.id, r.Term) @@ -682,7 +755,6 @@ func (r *raft) Step(m pb.Message) error { case m.Term == 0: // local message case m.Term > r.Term: - lead := m.From if m.Type == pb.MsgVote || m.Type == pb.MsgPreVote { force := bytes.Equal(m.Context, []byte(campaignTransfer)) inLease := r.checkQuorum && r.lead != None && r.electionElapsed < r.electionTimeout @@ -693,7 +765,6 @@ func (r *raft) Step(m pb.Message) error { r.id, r.raftLog.lastTerm(), r.raftLog.lastIndex(), r.Vote, m.Type, m.From, m.LogTerm, m.Index, r.Term, r.electionTimeout-r.electionElapsed) return nil } - lead = None } switch { case m.Type == pb.MsgPreVote: @@ -707,7 +778,11 @@ func (r *raft) Step(m pb.Message) error { default: r.logger.Infof("%x [term: %d] received a %s message with higher term from %x [term: %d]", r.id, r.Term, m.Type, m.From, m.Term) - r.becomeFollower(m.Term, lead) + if m.Type == pb.MsgApp || m.Type == pb.MsgHeartbeat || m.Type == pb.MsgSnap { + r.becomeFollower(m.Term, m.From) + } else { + r.becomeFollower(m.Term, None) + } } case m.Term < r.Term: @@ -757,12 +832,27 @@ func (r *raft) Step(m pb.Message) error { } case pb.MsgVote, pb.MsgPreVote: + if r.isLearner { + // TODO: learner may need to vote, in case of node down when confchange. + r.logger.Infof("%x [logterm: %d, index: %d, vote: %x] ignored %s from %x [logterm: %d, index: %d] at term %d: learner can not vote", + r.id, r.raftLog.lastTerm(), r.raftLog.lastIndex(), r.Vote, m.Type, m.From, m.LogTerm, m.Index, r.Term) + return nil + } // The m.Term > r.Term clause is for MsgPreVote. For MsgVote m.Term should // always equal r.Term. if (r.Vote == None || m.Term > r.Term || r.Vote == m.From) && r.raftLog.isUpToDate(m.Index, m.LogTerm) { r.logger.Infof("%x [logterm: %d, index: %d, vote: %x] cast %s for %x [logterm: %d, index: %d] at term %d", r.id, r.raftLog.lastTerm(), r.raftLog.lastIndex(), r.Vote, m.Type, m.From, m.LogTerm, m.Index, r.Term) - r.send(pb.Message{To: m.From, Type: voteRespMsgType(m.Type)}) + // When responding to Msg{Pre,}Vote messages we include the term + // from the message, not the local term. To see why consider the + // case where a single node was previously partitioned away and + // it's local term is now of date. If we include the local term + // (recall that for pre-votes we don't update the local term), the + // (pre-)campaigning node on the other end will proceed to ignore + // the message (it ignores all out of date messages). + // The term in the original message and current local term are the + // same in the case of regular votes, but different for pre-votes. + r.send(pb.Message{To: m.From, Term: m.Term, Type: voteRespMsgType(m.Type)}) if m.Type == pb.MsgVote { // Only record real votes. r.electionElapsed = 0 @@ -771,7 +861,7 @@ func (r *raft) Step(m pb.Message) error { } else { r.logger.Infof("%x [logterm: %d, index: %d, vote: %x] rejected %s from %x [logterm: %d, index: %d] at term %d", r.id, r.raftLog.lastTerm(), r.raftLog.lastIndex(), r.Vote, m.Type, m.From, m.LogTerm, m.Index, r.Term) - r.send(pb.Message{To: m.From, Type: voteRespMsgType(m.Type), Reject: true}) + r.send(pb.Message{To: m.From, Term: r.Term, Type: voteRespMsgType(m.Type), Reject: true}) } default: @@ -836,10 +926,7 @@ func stepLeader(r *raft, m pb.Message) { r.readOnly.addRequest(r.raftLog.committed, m) r.bcastHeartbeatWithCtx(m.Entries[0].Data) case ReadOnlyLeaseBased: - var ri uint64 - if r.checkQuorum { - ri = r.raftLog.committed - } + ri := r.raftLog.committed if m.From == None || m.From == r.id { // from local member r.readStates = append(r.readStates, ReadState{Index: r.raftLog.committed, RequestCtx: m.Entries[0].Data}) } else { @@ -854,8 +941,8 @@ func stepLeader(r *raft, m pb.Message) { } // All other message types require a progress for m.From (pr). - pr, prOk := r.prs[m.From] - if !prOk { + pr := r.getProgress(m.From) + if pr == nil { r.logger.Debugf("%x no progress available for %x", r.id, m.From) return } @@ -954,6 +1041,10 @@ func stepLeader(r *raft, m pb.Message) { } r.logger.Debugf("%x failed to send message to %x because it is unreachable [%s]", r.id, m.From, pr) case pb.MsgTransferLeader: + if pr.IsLearner { + r.logger.Debugf("%x is learner. Ignored transferring leadership", r.id) + return + } leadTransferee := m.From lastLeadTransferee := r.leadTransferee if lastLeadTransferee != None { @@ -1033,6 +1124,9 @@ func stepFollower(r *raft, m pb.Message) { if r.lead == None { r.logger.Infof("%x no leader at term %d; dropping proposal", r.id, r.Term) return + } else if r.disableProposalForwarding { + r.logger.Infof("%x not forwarding to leader %x at term %d; dropping proposal", r.id, r.lead, r.Term) + return } m.To = r.lead r.send(m) @@ -1127,20 +1221,37 @@ func (r *raft) restore(s pb.Snapshot) bool { return false } + // The normal peer can't become learner. + if !r.isLearner { + for _, id := range s.Metadata.ConfState.Learners { + if id == r.id { + r.logger.Errorf("%x can't become learner when restores snapshot [index: %d, term: %d]", r.id, s.Metadata.Index, s.Metadata.Term) + return false + } + } + } + r.logger.Infof("%x [commit: %d, lastindex: %d, lastterm: %d] starts to restore snapshot [index: %d, term: %d]", r.id, r.raftLog.committed, r.raftLog.lastIndex(), r.raftLog.lastTerm(), s.Metadata.Index, s.Metadata.Term) r.raftLog.restore(s) r.prs = make(map[uint64]*Progress) - for _, n := range s.Metadata.ConfState.Nodes { + r.learnerPrs = make(map[uint64]*Progress) + r.restoreNode(s.Metadata.ConfState.Nodes, false) + r.restoreNode(s.Metadata.ConfState.Learners, true) + return true +} + +func (r *raft) restoreNode(nodes []uint64, isLearner bool) { + for _, n := range nodes { match, next := uint64(0), r.raftLog.lastIndex()+1 if n == r.id { match = next - 1 + r.isLearner = isLearner } - r.setProgress(n, match, next) - r.logger.Infof("%x restored progress of %x [%s]", r.id, n, r.prs[n]) + r.setProgress(n, match, next, isLearner) + r.logger.Infof("%x restored progress of %x [%s]", r.id, n, r.getProgress(n)) } - return true } // promotable indicates whether state machine can be promoted to leader, @@ -1151,18 +1262,46 @@ func (r *raft) promotable() bool { } func (r *raft) addNode(id uint64) { + r.addNodeOrLearnerNode(id, false) +} + +func (r *raft) addLearner(id uint64) { + r.addNodeOrLearnerNode(id, true) +} + +func (r *raft) addNodeOrLearnerNode(id uint64, isLearner bool) { r.pendingConf = false - if _, ok := r.prs[id]; ok { - // Ignore any redundant addNode calls (which can happen because the - // initial bootstrapping entries are applied twice). - return + pr := r.getProgress(id) + if pr == nil { + r.setProgress(id, 0, r.raftLog.lastIndex()+1, isLearner) + } else { + if isLearner && !pr.IsLearner { + // can only change Learner to Voter + r.logger.Infof("%x ignored addLeaner: do not support changing %x from raft peer to learner.", r.id, id) + return + } + + if isLearner == pr.IsLearner { + // Ignore any redundant addNode calls (which can happen because the + // initial bootstrapping entries are applied twice). + return + } + + // change Learner to Voter, use origin Learner progress + delete(r.learnerPrs, id) + pr.IsLearner = false + r.prs[id] = pr + } + + if r.id == id { + r.isLearner = isLearner } - r.setProgress(id, 0, r.raftLog.lastIndex()+1) // When a node is first added, we should mark it as recently active. // Otherwise, CheckQuorum may cause us to step down if it is invoked // before the added node has a chance to communicate with us. - r.prs[id].RecentActive = true + pr = r.getProgress(id) + pr.RecentActive = true } func (r *raft) removeNode(id uint64) { @@ -1170,7 +1309,7 @@ func (r *raft) removeNode(id uint64) { r.pendingConf = false // do not try to commit or abort transferring if there is no nodes in the cluster. - if len(r.prs) == 0 { + if len(r.prs) == 0 && len(r.learnerPrs) == 0 { return } @@ -1187,12 +1326,22 @@ func (r *raft) removeNode(id uint64) { func (r *raft) resetPendingConf() { r.pendingConf = false } -func (r *raft) setProgress(id, match, next uint64) { - r.prs[id] = &Progress{Next: next, Match: match, ins: newInflights(r.maxInflight)} +func (r *raft) setProgress(id, match, next uint64, isLearner bool) { + if !isLearner { + delete(r.learnerPrs, id) + r.prs[id] = &Progress{Next: next, Match: match, ins: newInflights(r.maxInflight)} + return + } + + if _, ok := r.prs[id]; ok { + panic(fmt.Sprintf("%x unexpected changing from voter to learner for %x", r.id, id)) + } + r.learnerPrs[id] = &Progress{Next: next, Match: match, ins: newInflights(r.maxInflight), IsLearner: true} } func (r *raft) delProgress(id uint64) { delete(r.prs, id) + delete(r.learnerPrs, id) } func (r *raft) loadState(state pb.HardState) { @@ -1222,18 +1371,18 @@ func (r *raft) resetRandomizedElectionTimeout() { func (r *raft) checkQuorumActive() bool { var act int - for id := range r.prs { + r.forEachProgress(func(id uint64, pr *Progress) { if id == r.id { // self is always active act++ - continue + return } - if r.prs[id].RecentActive { + if pr.RecentActive && !pr.IsLearner { act++ } - r.prs[id].RecentActive = false - } + pr.RecentActive = false + }) return act >= r.quorum() } diff --git a/vendor/github.com/coreos/etcd/raft/raftpb/raft.pb.go b/vendor/github.com/coreos/etcd/raft/raftpb/raft.pb.go index 3c45eef003..fd9ee3729e 100644 --- a/vendor/github.com/coreos/etcd/raft/raftpb/raft.pb.go +++ b/vendor/github.com/coreos/etcd/raft/raftpb/raft.pb.go @@ -1,6 +1,5 @@ -// Code generated by protoc-gen-gogo. +// Code generated by protoc-gen-gogo. DO NOT EDIT. // source: raft.proto -// DO NOT EDIT! /* Package raftpb is a generated protocol buffer package. @@ -26,6 +25,8 @@ import ( math "math" + _ "github.com/gogo/protobuf/gogoproto" + io "io" ) @@ -162,20 +163,23 @@ func (MessageType) EnumDescriptor() ([]byte, []int) { return fileDescriptorRaft, type ConfChangeType int32 const ( - ConfChangeAddNode ConfChangeType = 0 - ConfChangeRemoveNode ConfChangeType = 1 - ConfChangeUpdateNode ConfChangeType = 2 + ConfChangeAddNode ConfChangeType = 0 + ConfChangeRemoveNode ConfChangeType = 1 + ConfChangeUpdateNode ConfChangeType = 2 + ConfChangeAddLearnerNode ConfChangeType = 3 ) var ConfChangeType_name = map[int32]string{ 0: "ConfChangeAddNode", 1: "ConfChangeRemoveNode", 2: "ConfChangeUpdateNode", + 3: "ConfChangeAddLearnerNode", } var ConfChangeType_value = map[string]int32{ - "ConfChangeAddNode": 0, - "ConfChangeRemoveNode": 1, - "ConfChangeUpdateNode": 2, + "ConfChangeAddNode": 0, + "ConfChangeRemoveNode": 1, + "ConfChangeUpdateNode": 2, + "ConfChangeAddLearnerNode": 3, } func (x ConfChangeType) Enum() *ConfChangeType { @@ -267,6 +271,7 @@ func (*HardState) Descriptor() ([]byte, []int) { return fileDescriptorRaft, []in type ConfState struct { Nodes []uint64 `protobuf:"varint,1,rep,name=nodes" json:"nodes,omitempty"` + Learners []uint64 `protobuf:"varint,2,rep,name=learners" json:"learners,omitempty"` XXX_unrecognized []byte `json:"-"` } @@ -537,6 +542,13 @@ func (m *ConfState) MarshalTo(dAtA []byte) (int, error) { i = encodeVarintRaft(dAtA, i, uint64(num)) } } + if len(m.Learners) > 0 { + for _, num := range m.Learners { + dAtA[i] = 0x10 + i++ + i = encodeVarintRaft(dAtA, i, uint64(num)) + } + } if m.XXX_unrecognized != nil { i += copy(dAtA[i:], m.XXX_unrecognized) } @@ -579,24 +591,6 @@ func (m *ConfChange) MarshalTo(dAtA []byte) (int, error) { return i, nil } -func encodeFixed64Raft(dAtA []byte, offset int, v uint64) int { - dAtA[offset] = uint8(v) - dAtA[offset+1] = uint8(v >> 8) - dAtA[offset+2] = uint8(v >> 16) - dAtA[offset+3] = uint8(v >> 24) - dAtA[offset+4] = uint8(v >> 32) - dAtA[offset+5] = uint8(v >> 40) - dAtA[offset+6] = uint8(v >> 48) - dAtA[offset+7] = uint8(v >> 56) - return offset + 8 -} -func encodeFixed32Raft(dAtA []byte, offset int, v uint32) int { - dAtA[offset] = uint8(v) - dAtA[offset+1] = uint8(v >> 8) - dAtA[offset+2] = uint8(v >> 16) - dAtA[offset+3] = uint8(v >> 24) - return offset + 4 -} func encodeVarintRaft(dAtA []byte, offset int, v uint64) int { for v >= 1<<7 { dAtA[offset] = uint8(v&0x7f | 0x80) @@ -700,6 +694,11 @@ func (m *ConfState) Size() (n int) { n += 1 + sovRaft(uint64(e)) } } + if len(m.Learners) > 0 { + for _, e := range m.Learners { + n += 1 + sovRaft(uint64(e)) + } + } if m.XXX_unrecognized != nil { n += len(m.XXX_unrecognized) } @@ -1558,25 +1557,129 @@ func (m *ConfState) Unmarshal(dAtA []byte) error { } switch fieldNum { case 1: - if wireType != 0 { + if wireType == 0 { + var v uint64 + for shift := uint(0); ; shift += 7 { + if shift >= 64 { + return ErrIntOverflowRaft + } + if iNdEx >= l { + return io.ErrUnexpectedEOF + } + b := dAtA[iNdEx] + iNdEx++ + v |= (uint64(b) & 0x7F) << shift + if b < 0x80 { + break + } + } + m.Nodes = append(m.Nodes, v) + } else if wireType == 2 { + var packedLen int + for shift := uint(0); ; shift += 7 { + if shift >= 64 { + return ErrIntOverflowRaft + } + if iNdEx >= l { + return io.ErrUnexpectedEOF + } + b := dAtA[iNdEx] + iNdEx++ + packedLen |= (int(b) & 0x7F) << shift + if b < 0x80 { + break + } + } + if packedLen < 0 { + return ErrInvalidLengthRaft + } + postIndex := iNdEx + packedLen + if postIndex > l { + return io.ErrUnexpectedEOF + } + for iNdEx < postIndex { + var v uint64 + for shift := uint(0); ; shift += 7 { + if shift >= 64 { + return ErrIntOverflowRaft + } + if iNdEx >= l { + return io.ErrUnexpectedEOF + } + b := dAtA[iNdEx] + iNdEx++ + v |= (uint64(b) & 0x7F) << shift + if b < 0x80 { + break + } + } + m.Nodes = append(m.Nodes, v) + } + } else { return fmt.Errorf("proto: wrong wireType = %d for field Nodes", wireType) } - var v uint64 - for shift := uint(0); ; shift += 7 { - if shift >= 64 { - return ErrIntOverflowRaft + case 2: + if wireType == 0 { + var v uint64 + for shift := uint(0); ; shift += 7 { + if shift >= 64 { + return ErrIntOverflowRaft + } + if iNdEx >= l { + return io.ErrUnexpectedEOF + } + b := dAtA[iNdEx] + iNdEx++ + v |= (uint64(b) & 0x7F) << shift + if b < 0x80 { + break + } } - if iNdEx >= l { + m.Learners = append(m.Learners, v) + } else if wireType == 2 { + var packedLen int + for shift := uint(0); ; shift += 7 { + if shift >= 64 { + return ErrIntOverflowRaft + } + if iNdEx >= l { + return io.ErrUnexpectedEOF + } + b := dAtA[iNdEx] + iNdEx++ + packedLen |= (int(b) & 0x7F) << shift + if b < 0x80 { + break + } + } + if packedLen < 0 { + return ErrInvalidLengthRaft + } + postIndex := iNdEx + packedLen + if postIndex > l { return io.ErrUnexpectedEOF } - b := dAtA[iNdEx] - iNdEx++ - v |= (uint64(b) & 0x7F) << shift - if b < 0x80 { - break + for iNdEx < postIndex { + var v uint64 + for shift := uint(0); ; shift += 7 { + if shift >= 64 { + return ErrIntOverflowRaft + } + if iNdEx >= l { + return io.ErrUnexpectedEOF + } + b := dAtA[iNdEx] + iNdEx++ + v |= (uint64(b) & 0x7F) << shift + if b < 0x80 { + break + } + } + m.Learners = append(m.Learners, v) } + } else { + return fmt.Errorf("proto: wrong wireType = %d for field Learners", wireType) } - m.Nodes = append(m.Nodes, v) default: iNdEx = preIndex skippy, err := skipRaft(dAtA[iNdEx:]) @@ -1846,55 +1949,56 @@ var ( func init() { proto.RegisterFile("raft.proto", fileDescriptorRaft) } var fileDescriptorRaft = []byte{ - // 790 bytes of a gzipped FileDescriptorProto - 0x1f, 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02, 0xff, 0x64, 0x54, 0xcd, 0x6e, 0xdb, 0x46, - 0x10, 0x16, 0x29, 0xea, 0x6f, 0x28, 0xcb, 0xab, 0xb5, 0x5a, 0x2c, 0x0c, 0x43, 0x55, 0x85, 0x1e, - 0x04, 0x17, 0x76, 0x5b, 0x1d, 0x7a, 0xe8, 0xcd, 0x96, 0x0a, 0x58, 0x40, 0x65, 0xb8, 0xb2, 0xdc, - 0x43, 0x83, 0x20, 0x58, 0x8b, 0x2b, 0x4a, 0x89, 0xc9, 0x25, 0x96, 0x2b, 0xc7, 0xbe, 0x04, 0x79, - 0x80, 0x3c, 0x40, 0x2e, 0x79, 0x1f, 0x1f, 0x0d, 0xe4, 0x1e, 0xc4, 0xce, 0x8b, 0x04, 0xbb, 0x5c, - 0x4a, 0x94, 0x74, 0xdb, 0xf9, 0xbe, 0xe1, 0xcc, 0x37, 0xdf, 0xce, 0x12, 0x40, 0xd0, 0xa9, 0x3c, - 0x8e, 0x04, 0x97, 0x1c, 0x17, 0xd5, 0x39, 0xba, 0xde, 0x6f, 0xf8, 0xdc, 0xe7, 0x1a, 0xfa, 0x4d, - 0x9d, 0x12, 0xb6, 0xfd, 0x0e, 0x0a, 0x7f, 0x87, 0x52, 0xdc, 0xe3, 0x5f, 0xc1, 0x19, 0xdf, 0x47, - 0x8c, 0x58, 0x2d, 0xab, 0x53, 0xeb, 0xd6, 0x8f, 0x93, 0xaf, 0x8e, 0x35, 0xa9, 0x88, 0x53, 0xe7, - 0xe1, 0xcb, 0x4f, 0xb9, 0x91, 0x4e, 0xc2, 0x04, 0x9c, 0x31, 0x13, 0x01, 0xb1, 0x5b, 0x56, 0xc7, - 0x59, 0x32, 0x4c, 0x04, 0x78, 0x1f, 0x0a, 0x83, 0xd0, 0x63, 0x77, 0x24, 0x9f, 0xa1, 0x12, 0x08, - 0x63, 0x70, 0xfa, 0x54, 0x52, 0xe2, 0xb4, 0xac, 0x4e, 0x75, 0xa4, 0xcf, 0xed, 0xf7, 0x16, 0xa0, - 0xcb, 0x90, 0x46, 0xf1, 0x8c, 0xcb, 0x21, 0x93, 0xd4, 0xa3, 0x92, 0xe2, 0x3f, 0x01, 0x26, 0x3c, - 0x9c, 0xbe, 0x8a, 0x25, 0x95, 0x89, 0x22, 0x77, 0xa5, 0xa8, 0xc7, 0xc3, 0xe9, 0xa5, 0x22, 0x4c, - 0xf1, 0xca, 0x24, 0x05, 0x54, 0xf3, 0xb9, 0x6e, 0x9e, 0xd5, 0x95, 0x40, 0x4a, 0xb2, 0x54, 0x92, - 0xb3, 0xba, 0x34, 0xd2, 0xfe, 0x1f, 0xca, 0xa9, 0x02, 0x25, 0x51, 0x29, 0xd0, 0x3d, 0xab, 0x23, - 0x7d, 0xc6, 0x7f, 0x41, 0x39, 0x30, 0xca, 0x74, 0x61, 0xb7, 0x4b, 0x52, 0x2d, 0x9b, 0xca, 0x4d, - 0xdd, 0x65, 0x7e, 0xfb, 0x53, 0x1e, 0x4a, 0x43, 0x16, 0xc7, 0xd4, 0x67, 0xf8, 0x08, 0x1c, 0xb9, - 0x72, 0x78, 0x2f, 0xad, 0x61, 0xe8, 0xac, 0xc7, 0x2a, 0x0d, 0x37, 0xc0, 0x96, 0x7c, 0x6d, 0x12, - 0x5b, 0x72, 0x35, 0xc6, 0x54, 0xf0, 0x8d, 0x31, 0x14, 0xb2, 0x1c, 0xd0, 0xd9, 0x1c, 0x10, 0x37, - 0xa1, 0x74, 0xc3, 0x7d, 0x7d, 0x61, 0x85, 0x0c, 0x99, 0x82, 0x2b, 0xdb, 0x8a, 0xdb, 0xb6, 0x1d, - 0x41, 0x89, 0x85, 0x52, 0xcc, 0x59, 0x4c, 0x4a, 0xad, 0x7c, 0xc7, 0xed, 0xee, 0xac, 0x6d, 0x46, - 0x5a, 0xca, 0xe4, 0xe0, 0x03, 0x28, 0x4e, 0x78, 0x10, 0xcc, 0x25, 0x29, 0x67, 0x6a, 0x19, 0x0c, - 0x77, 0xa1, 0x1c, 0x1b, 0xc7, 0x48, 0x45, 0x3b, 0x89, 0x36, 0x9d, 0x4c, 0x1d, 0x4c, 0xf3, 0x54, - 0x45, 0xc1, 0x5e, 0xb3, 0x89, 0x24, 0xd0, 0xb2, 0x3a, 0xe5, 0xb4, 0x62, 0x82, 0xe1, 0x5f, 0x00, - 0x92, 0xd3, 0xd9, 0x3c, 0x94, 0xc4, 0xcd, 0xf4, 0xcc, 0xe0, 0x98, 0x40, 0x69, 0xc2, 0x43, 0xc9, - 0xee, 0x24, 0xa9, 0xea, 0x8b, 0x4d, 0xc3, 0xf6, 0x4b, 0xa8, 0x9c, 0x51, 0xe1, 0x25, 0xeb, 0x93, - 0x3a, 0x68, 0x6d, 0x39, 0x48, 0xc0, 0xb9, 0xe5, 0x92, 0xad, 0xef, 0xbb, 0x42, 0x32, 0x03, 0xe7, - 0xb7, 0x07, 0x6e, 0xff, 0x0c, 0x95, 0xe5, 0xba, 0xe2, 0x06, 0x14, 0x42, 0xee, 0xb1, 0x98, 0x58, - 0xad, 0x7c, 0xc7, 0x19, 0x25, 0x41, 0xfb, 0x83, 0x05, 0xa0, 0x72, 0x7a, 0x33, 0x1a, 0xfa, 0xfa, - 0xd6, 0x07, 0xfd, 0x35, 0x05, 0xf6, 0xa0, 0x8f, 0x7f, 0x37, 0x8f, 0xd3, 0xd6, 0xab, 0xf3, 0x63, - 0xf6, 0x29, 0x24, 0xdf, 0x6d, 0xbd, 0xd0, 0x03, 0x28, 0x9e, 0x73, 0x8f, 0x0d, 0xfa, 0xeb, 0xba, - 0x12, 0x4c, 0x19, 0xd2, 0x33, 0x86, 0x24, 0x8f, 0x31, 0x0d, 0x0f, 0xff, 0x80, 0xca, 0xf2, 0xc9, - 0xe3, 0x5d, 0x70, 0x75, 0x70, 0xce, 0x45, 0x40, 0x6f, 0x50, 0x0e, 0xef, 0xc1, 0xae, 0x06, 0x56, - 0x8d, 0x91, 0x75, 0xf8, 0xd9, 0x06, 0x37, 0xb3, 0xc4, 0x18, 0xa0, 0x38, 0x8c, 0xfd, 0xb3, 0x45, - 0x84, 0x72, 0xd8, 0x85, 0xd2, 0x30, 0xf6, 0x4f, 0x19, 0x95, 0xc8, 0x32, 0xc1, 0x85, 0xe0, 0x11, - 0xb2, 0x4d, 0xd6, 0x49, 0x14, 0xa1, 0x3c, 0xae, 0x01, 0x24, 0xe7, 0x11, 0x8b, 0x23, 0xe4, 0x98, - 0xc4, 0xff, 0xb8, 0x64, 0xa8, 0xa0, 0x44, 0x98, 0x40, 0xb3, 0x45, 0xc3, 0xaa, 0x85, 0x41, 0x25, - 0x8c, 0xa0, 0xaa, 0x9a, 0x31, 0x2a, 0xe4, 0xb5, 0xea, 0x52, 0xc6, 0x0d, 0x40, 0x59, 0x44, 0x7f, - 0x54, 0xc1, 0x18, 0x6a, 0xc3, 0xd8, 0xbf, 0x0a, 0x05, 0xa3, 0x93, 0x19, 0xbd, 0xbe, 0x61, 0x08, - 0x70, 0x1d, 0x76, 0x4c, 0x21, 0x75, 0x41, 0x8b, 0x18, 0xb9, 0x26, 0xad, 0x37, 0x63, 0x93, 0x37, - 0xff, 0x2e, 0xb8, 0x58, 0x04, 0xa8, 0x8a, 0x7f, 0x80, 0xfa, 0x30, 0xf6, 0xc7, 0x82, 0x86, 0xf1, - 0x94, 0x89, 0x7f, 0x18, 0xf5, 0x98, 0x40, 0x3b, 0xe6, 0xeb, 0xf1, 0x3c, 0x60, 0x7c, 0x21, 0xcf, - 0xf9, 0x5b, 0x54, 0x33, 0x62, 0x46, 0x8c, 0x7a, 0xfa, 0x87, 0x87, 0x76, 0x8d, 0x98, 0x25, 0xa2, - 0xc5, 0x20, 0x33, 0xef, 0x85, 0x60, 0x7a, 0xc4, 0xba, 0xe9, 0x6a, 0x62, 0x9d, 0x83, 0x0f, 0x5f, - 0x40, 0x6d, 0xfd, 0x7a, 0x95, 0x8e, 0x15, 0x72, 0xe2, 0x79, 0xea, 0x2e, 0x51, 0x0e, 0x13, 0x68, - 0xac, 0xe0, 0x11, 0x0b, 0xf8, 0x2d, 0xd3, 0x8c, 0xb5, 0xce, 0x5c, 0x45, 0x1e, 0x95, 0x09, 0x63, - 0x9f, 0x92, 0x87, 0xa7, 0x66, 0xee, 0xf1, 0xa9, 0x99, 0x7b, 0x78, 0x6e, 0x5a, 0x8f, 0xcf, 0x4d, - 0xeb, 0xeb, 0x73, 0xd3, 0xfa, 0xf8, 0xad, 0x99, 0xfb, 0x1e, 0x00, 0x00, 0xff, 0xff, 0xcf, 0x30, - 0x01, 0x41, 0x3a, 0x06, 0x00, 0x00, + // 815 bytes of a gzipped FileDescriptorProto + 0x1f, 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02, 0xff, 0x64, 0x54, 0xcd, 0x6e, 0x23, 0x45, + 0x10, 0xf6, 0x8c, 0xc7, 0x7f, 0x35, 0x8e, 0xd3, 0xa9, 0x35, 0xa8, 0x15, 0x45, 0xc6, 0xb2, 0x38, + 0x58, 0x41, 0x1b, 0x20, 0x07, 0x0e, 0x48, 0x1c, 0x36, 0x09, 0x52, 0x22, 0xad, 0xa3, 0xc5, 0x9b, + 0xe5, 0x80, 0x84, 0x50, 0xc7, 0x53, 0x9e, 0x18, 0x32, 0xd3, 0xa3, 0x9e, 0xf6, 0xb2, 0xb9, 0x20, + 0x1e, 0x80, 0x07, 0xe0, 0xc2, 0xfb, 0xe4, 0xb8, 0x12, 0x77, 0xc4, 0x86, 0x17, 0x41, 0xdd, 0xd3, + 0x63, 0xcf, 0x24, 0xb7, 0xae, 0xef, 0xab, 0xae, 0xfa, 0xea, 0xeb, 0x9a, 0x01, 0x50, 0x62, 0xa9, + 0x8f, 0x32, 0x25, 0xb5, 0xc4, 0xb6, 0x39, 0x67, 0xd7, 0xfb, 0xc3, 0x58, 0xc6, 0xd2, 0x42, 0x9f, + 0x9b, 0x53, 0xc1, 0x4e, 0x7e, 0x83, 0xd6, 0xb7, 0xa9, 0x56, 0x77, 0xf8, 0x19, 0x04, 0x57, 0x77, + 0x19, 0x71, 0x6f, 0xec, 0x4d, 0x07, 0xc7, 0x7b, 0x47, 0xc5, 0xad, 0x23, 0x4b, 0x1a, 0xe2, 0x24, + 0xb8, 0xff, 0xe7, 0x93, 0xc6, 0xdc, 0x26, 0x21, 0x87, 0xe0, 0x8a, 0x54, 0xc2, 0xfd, 0xb1, 0x37, + 0x0d, 0x36, 0x0c, 0xa9, 0x04, 0xf7, 0xa1, 0x75, 0x91, 0x46, 0xf4, 0x8e, 0x37, 0x2b, 0x54, 0x01, + 0x21, 0x42, 0x70, 0x26, 0xb4, 0xe0, 0xc1, 0xd8, 0x9b, 0xf6, 0xe7, 0xf6, 0x3c, 0xf9, 0xdd, 0x03, + 0xf6, 0x3a, 0x15, 0x59, 0x7e, 0x23, 0xf5, 0x8c, 0xb4, 0x88, 0x84, 0x16, 0xf8, 0x15, 0xc0, 0x42, + 0xa6, 0xcb, 0x9f, 0x72, 0x2d, 0x74, 0xa1, 0x28, 0xdc, 0x2a, 0x3a, 0x95, 0xe9, 0xf2, 0xb5, 0x21, + 0x5c, 0xf1, 0xde, 0xa2, 0x04, 0x4c, 0xf3, 0x95, 0x6d, 0x5e, 0xd5, 0x55, 0x40, 0x46, 0xb2, 0x36, + 0x92, 0xab, 0xba, 0x2c, 0x32, 0xf9, 0x01, 0xba, 0xa5, 0x02, 0x23, 0xd1, 0x28, 0xb0, 0x3d, 0xfb, + 0x73, 0x7b, 0xc6, 0xaf, 0xa1, 0x9b, 0x38, 0x65, 0xb6, 0x70, 0x78, 0xcc, 0x4b, 0x2d, 0x8f, 0x95, + 0xbb, 0xba, 0x9b, 0xfc, 0xc9, 0x5f, 0x4d, 0xe8, 0xcc, 0x28, 0xcf, 0x45, 0x4c, 0xf8, 0x1c, 0x02, + 0xbd, 0x75, 0xf8, 0x59, 0x59, 0xc3, 0xd1, 0x55, 0x8f, 0x4d, 0x1a, 0x0e, 0xc1, 0xd7, 0xb2, 0x36, + 0x89, 0xaf, 0xa5, 0x19, 0x63, 0xa9, 0xe4, 0xa3, 0x31, 0x0c, 0xb2, 0x19, 0x30, 0x78, 0x3c, 0x20, + 0x8e, 0xa0, 0x73, 0x2b, 0x63, 0xfb, 0x60, 0xad, 0x0a, 0x59, 0x82, 0x5b, 0xdb, 0xda, 0x4f, 0x6d, + 0x7b, 0x0e, 0x1d, 0x4a, 0xb5, 0x5a, 0x51, 0xce, 0x3b, 0xe3, 0xe6, 0x34, 0x3c, 0xde, 0xa9, 0x6d, + 0x46, 0x59, 0xca, 0xe5, 0xe0, 0x01, 0xb4, 0x17, 0x32, 0x49, 0x56, 0x9a, 0x77, 0x2b, 0xb5, 0x1c, + 0x86, 0xc7, 0xd0, 0xcd, 0x9d, 0x63, 0xbc, 0x67, 0x9d, 0x64, 0x8f, 0x9d, 0x2c, 0x1d, 0x2c, 0xf3, + 0x4c, 0x45, 0x45, 0x3f, 0xd3, 0x42, 0x73, 0x18, 0x7b, 0xd3, 0x6e, 0x59, 0xb1, 0xc0, 0xf0, 0x53, + 0x80, 0xe2, 0x74, 0xbe, 0x4a, 0x35, 0x0f, 0x2b, 0x3d, 0x2b, 0x38, 0x72, 0xe8, 0x2c, 0x64, 0xaa, + 0xe9, 0x9d, 0xe6, 0x7d, 0xfb, 0xb0, 0x65, 0x38, 0xf9, 0x11, 0x7a, 0xe7, 0x42, 0x45, 0xc5, 0xfa, + 0x94, 0x0e, 0x7a, 0x4f, 0x1c, 0xe4, 0x10, 0xbc, 0x95, 0x9a, 0xea, 0xfb, 0x6e, 0x90, 0xca, 0xc0, + 0xcd, 0xa7, 0x03, 0x4f, 0xbe, 0x81, 0xde, 0x66, 0x5d, 0x71, 0x08, 0xad, 0x54, 0x46, 0x94, 0x73, + 0x6f, 0xdc, 0x9c, 0x06, 0xf3, 0x22, 0xc0, 0x7d, 0xe8, 0xde, 0x92, 0x50, 0x29, 0xa9, 0x9c, 0xfb, + 0x96, 0xd8, 0xc4, 0x93, 0x3f, 0x3c, 0x00, 0x73, 0xff, 0xf4, 0x46, 0xa4, 0xb1, 0xdd, 0x88, 0x8b, + 0xb3, 0x9a, 0x3a, 0xff, 0xe2, 0x0c, 0xbf, 0x70, 0x1f, 0xae, 0x6f, 0xd7, 0xea, 0xe3, 0xea, 0x67, + 0x52, 0xdc, 0x7b, 0xf2, 0xf5, 0x1e, 0x40, 0xfb, 0x52, 0x46, 0x74, 0x71, 0x56, 0xd7, 0x5c, 0x60, + 0xc6, 0xac, 0x53, 0x67, 0x56, 0xf1, 0xa1, 0x96, 0xe1, 0xe1, 0x97, 0xd0, 0xdb, 0xfc, 0x0e, 0x70, + 0x17, 0x42, 0x1b, 0x5c, 0x4a, 0x95, 0x88, 0x5b, 0xd6, 0xc0, 0x67, 0xb0, 0x6b, 0x81, 0x6d, 0x63, + 0xe6, 0x1d, 0xfe, 0xed, 0x43, 0x58, 0x59, 0x70, 0x04, 0x68, 0xcf, 0xf2, 0xf8, 0x7c, 0x9d, 0xb1, + 0x06, 0x86, 0xd0, 0x99, 0xe5, 0xf1, 0x09, 0x09, 0xcd, 0x3c, 0x17, 0xbc, 0x52, 0x32, 0x63, 0xbe, + 0xcb, 0x7a, 0x91, 0x65, 0xac, 0x89, 0x03, 0x80, 0xe2, 0x3c, 0xa7, 0x3c, 0x63, 0x81, 0x4b, 0xfc, + 0x5e, 0x6a, 0x62, 0x2d, 0x23, 0xc2, 0x05, 0x96, 0x6d, 0x3b, 0xd6, 0x2c, 0x13, 0xeb, 0x20, 0x83, + 0xbe, 0x69, 0x46, 0x42, 0xe9, 0x6b, 0xd3, 0xa5, 0x8b, 0x43, 0x60, 0x55, 0xc4, 0x5e, 0xea, 0x21, + 0xc2, 0x60, 0x96, 0xc7, 0x6f, 0x52, 0x45, 0x62, 0x71, 0x23, 0xae, 0x6f, 0x89, 0x01, 0xee, 0xc1, + 0x8e, 0x2b, 0x64, 0x1e, 0x6f, 0x9d, 0xb3, 0xd0, 0xa5, 0x9d, 0xde, 0xd0, 0xe2, 0x97, 0xef, 0xd6, + 0x52, 0xad, 0x13, 0xd6, 0xc7, 0x8f, 0x60, 0x6f, 0x96, 0xc7, 0x57, 0x4a, 0xa4, 0xf9, 0x92, 0xd4, + 0x4b, 0x12, 0x11, 0x29, 0xb6, 0xe3, 0x6e, 0x5f, 0xad, 0x12, 0x92, 0x6b, 0x7d, 0x29, 0x7f, 0x65, + 0x03, 0x27, 0x66, 0x4e, 0x22, 0xb2, 0x3f, 0x43, 0xb6, 0xeb, 0xc4, 0x6c, 0x10, 0x2b, 0x86, 0xb9, + 0x79, 0x5f, 0x29, 0xb2, 0x23, 0xee, 0xb9, 0xae, 0x2e, 0xb6, 0x39, 0x78, 0x78, 0x07, 0x83, 0xfa, + 0xf3, 0x1a, 0x1d, 0x5b, 0xe4, 0x45, 0x14, 0x99, 0xb7, 0x64, 0x0d, 0xe4, 0x30, 0xdc, 0xc2, 0x73, + 0x4a, 0xe4, 0x5b, 0xb2, 0x8c, 0x57, 0x67, 0xde, 0x64, 0x91, 0xd0, 0x05, 0xe3, 0xe3, 0x01, 0xf0, + 0x5a, 0xa9, 0x97, 0xc5, 0x36, 0x5a, 0xb6, 0x79, 0xc2, 0xef, 0x3f, 0x8c, 0x1a, 0xef, 0x3f, 0x8c, + 0x1a, 0xf7, 0x0f, 0x23, 0xef, 0xfd, 0xc3, 0xc8, 0xfb, 0xf7, 0x61, 0xe4, 0xfd, 0xf9, 0xdf, 0xa8, + 0xf1, 0x7f, 0x00, 0x00, 0x00, 0xff, 0xff, 0x86, 0x52, 0x5b, 0xe0, 0x74, 0x06, 0x00, 0x00, } diff --git a/vendor/github.com/coreos/etcd/raft/raftpb/raft.proto b/vendor/github.com/coreos/etcd/raft/raftpb/raft.proto index 806a43634f..644ce7b8f2 100644 --- a/vendor/github.com/coreos/etcd/raft/raftpb/raft.proto +++ b/vendor/github.com/coreos/etcd/raft/raftpb/raft.proto @@ -76,13 +76,15 @@ message HardState { } message ConfState { - repeated uint64 nodes = 1; + repeated uint64 nodes = 1; + repeated uint64 learners = 2; } enum ConfChangeType { - ConfChangeAddNode = 0; - ConfChangeRemoveNode = 1; - ConfChangeUpdateNode = 2; + ConfChangeAddNode = 0; + ConfChangeRemoveNode = 1; + ConfChangeUpdateNode = 2; + ConfChangeAddLearnerNode = 3; } message ConfChange { diff --git a/vendor/github.com/coreos/etcd/raft/rawnode.go b/vendor/github.com/coreos/etcd/raft/rawnode.go index b950d5169a..925cb851c4 100644 --- a/vendor/github.com/coreos/etcd/raft/rawnode.go +++ b/vendor/github.com/coreos/etcd/raft/rawnode.go @@ -175,6 +175,8 @@ func (rn *RawNode) ApplyConfChange(cc pb.ConfChange) *pb.ConfState { switch cc.Type { case pb.ConfChangeAddNode: rn.raft.addNode(cc.NodeID) + case pb.ConfChangeAddLearnerNode: + rn.raft.addLearner(cc.NodeID) case pb.ConfChangeRemoveNode: rn.raft.removeNode(cc.NodeID) case pb.ConfChangeUpdateNode: @@ -191,7 +193,7 @@ func (rn *RawNode) Step(m pb.Message) error { if IsLocalMsg(m.Type) { return ErrStepLocalMsg } - if _, ok := rn.raft.prs[m.From]; ok || !IsResponseMsg(m.Type) { + if pr := rn.raft.getProgress(m.From); pr != nil || !IsResponseMsg(m.Type) { return rn.raft.Step(m) } return ErrStepPeerNotFound diff --git a/vendor/github.com/coreos/etcd/raft/read_only.go b/vendor/github.com/coreos/etcd/raft/read_only.go index d0085237e3..ae746fa73e 100644 --- a/vendor/github.com/coreos/etcd/raft/read_only.go +++ b/vendor/github.com/coreos/etcd/raft/read_only.go @@ -18,7 +18,7 @@ import pb "github.com/coreos/etcd/raft/raftpb" // ReadState provides state for read only query. // It's caller's responsibility to call ReadIndex first before getting -// this state from ready, It's also caller's duty to differentiate if this +// this state from ready, it's also caller's duty to differentiate if this // state is what it requests through RequestCtx, eg. given a unique id as // RequestCtx type ReadState struct { diff --git a/vendor/github.com/coreos/etcd/raft/status.go b/vendor/github.com/coreos/etcd/raft/status.go index b690fa56b9..f4d3d86a4e 100644 --- a/vendor/github.com/coreos/etcd/raft/status.go +++ b/vendor/github.com/coreos/etcd/raft/status.go @@ -28,11 +28,17 @@ type Status struct { Applied uint64 Progress map[uint64]Progress + + LeadTransferee uint64 } // getStatus gets a copy of the current raft status. func getStatus(r *raft) Status { - s := Status{ID: r.id} + s := Status{ + ID: r.id, + LeadTransferee: r.leadTransferee, + } + s.HardState = r.hardState() s.SoftState = *r.softState() @@ -43,6 +49,10 @@ func getStatus(r *raft) Status { for id, p := range r.prs { s.Progress[id] = *p } + + for id, p := range r.learnerPrs { + s.Progress[id] = *p + } } return s @@ -51,19 +61,21 @@ func getStatus(r *raft) Status { // MarshalJSON translates the raft status into JSON. // TODO: try to simplify this by introducing ID type into raft func (s Status) MarshalJSON() ([]byte, error) { - j := fmt.Sprintf(`{"id":"%x","term":%d,"vote":"%x","commit":%d,"lead":"%x","raftState":%q,"progress":{`, - s.ID, s.Term, s.Vote, s.Commit, s.Lead, s.RaftState) + j := fmt.Sprintf(`{"id":"%x","term":%d,"vote":"%x","commit":%d,"lead":"%x","raftState":%q,"applied":%d,"progress":{`, + s.ID, s.Term, s.Vote, s.Commit, s.Lead, s.RaftState, s.Applied) if len(s.Progress) == 0 { - j += "}}" + j += "}," } else { for k, v := range s.Progress { subj := fmt.Sprintf(`"%x":{"match":%d,"next":%d,"state":%q},`, k, v.Match, v.Next, v.State) j += subj } // remove the trailing "," - j = j[:len(j)-1] + "}}" + j = j[:len(j)-1] + "}," } + + j += fmt.Sprintf(`"leadtransferee":"%x"}`, s.LeadTransferee) return []byte(j), nil } diff --git a/vendor/github.com/coreos/etcd/snap/snappb/snap.pb.go b/vendor/github.com/coreos/etcd/snap/snappb/snap.pb.go index 05a77ff9d0..e72b577f5b 100644 --- a/vendor/github.com/coreos/etcd/snap/snappb/snap.pb.go +++ b/vendor/github.com/coreos/etcd/snap/snappb/snap.pb.go @@ -1,6 +1,5 @@ -// Code generated by protoc-gen-gogo. +// Code generated by protoc-gen-gogo. DO NOT EDIT. // source: snap.proto -// DO NOT EDIT! /* Package snappb is a generated protocol buffer package. @@ -20,6 +19,8 @@ import ( math "math" + _ "github.com/gogo/protobuf/gogoproto" + io "io" ) @@ -78,24 +79,6 @@ func (m *Snapshot) MarshalTo(dAtA []byte) (int, error) { return i, nil } -func encodeFixed64Snap(dAtA []byte, offset int, v uint64) int { - dAtA[offset] = uint8(v) - dAtA[offset+1] = uint8(v >> 8) - dAtA[offset+2] = uint8(v >> 16) - dAtA[offset+3] = uint8(v >> 24) - dAtA[offset+4] = uint8(v >> 32) - dAtA[offset+5] = uint8(v >> 40) - dAtA[offset+6] = uint8(v >> 48) - dAtA[offset+7] = uint8(v >> 56) - return offset + 8 -} -func encodeFixed32Snap(dAtA []byte, offset int, v uint32) int { - dAtA[offset] = uint8(v) - dAtA[offset+1] = uint8(v >> 8) - dAtA[offset+2] = uint8(v >> 16) - dAtA[offset+3] = uint8(v >> 24) - return offset + 4 -} func encodeVarintSnap(dAtA []byte, offset int, v uint64) int { for v >= 1<<7 { dAtA[offset] = uint8(v&0x7f | 0x80) diff --git a/vendor/github.com/coreos/etcd/wal/decoder.go b/vendor/github.com/coreos/etcd/wal/decoder.go index 0d9b4428c9..6a217f897b 100644 --- a/vendor/github.com/coreos/etcd/wal/decoder.go +++ b/vendor/github.com/coreos/etcd/wal/decoder.go @@ -29,6 +29,9 @@ import ( const minSectorSize = 512 +// frameSizeBytes is frame size in bytes, including record size and padding size. +const frameSizeBytes = 8 + type decoder struct { mu sync.Mutex brs []*bufio.Reader @@ -104,7 +107,7 @@ func (d *decoder) decodeRecord(rec *walpb.Record) error { } } // record decoded as valid; point last valid offset to end of record - d.lastValidOff += recBytes + padBytes + 8 + d.lastValidOff += frameSizeBytes + recBytes + padBytes return nil } @@ -116,7 +119,7 @@ func decodeFrameSize(lenField int64) (recBytes int64, padBytes int64) { // padding is stored in lower 3 bits of length MSB padBytes = int64((uint64(lenField) >> 56) & 0x7) } - return + return recBytes, padBytes } // isTornEntry determines whether the last entry of the WAL was partially written @@ -126,7 +129,7 @@ func (d *decoder) isTornEntry(data []byte) bool { return false } - fileOff := d.lastValidOff + 8 + fileOff := d.lastValidOff + frameSizeBytes curOff := 0 chunks := [][]byte{} // split data on sector boundaries diff --git a/vendor/github.com/coreos/etcd/wal/encoder.go b/vendor/github.com/coreos/etcd/wal/encoder.go index aac1e197e5..e8040b8dff 100644 --- a/vendor/github.com/coreos/etcd/wal/encoder.go +++ b/vendor/github.com/coreos/etcd/wal/encoder.go @@ -103,7 +103,7 @@ func encodeFrameSize(dataBytes int) (lenField uint64, padBytes int) { if padBytes != 0 { lenField |= uint64(0x80|padBytes) << 56 } - return + return lenField, padBytes } func (e *encoder) flush() error { diff --git a/vendor/github.com/coreos/etcd/wal/file_pipeline.go b/vendor/github.com/coreos/etcd/wal/file_pipeline.go index 5e32a0693c..3a1c57c1c9 100644 --- a/vendor/github.com/coreos/etcd/wal/file_pipeline.go +++ b/vendor/github.com/coreos/etcd/wal/file_pipeline.go @@ -55,7 +55,7 @@ func (fp *filePipeline) Open() (f *fileutil.LockedFile, err error) { case f = <-fp.filec: case err = <-fp.errc: } - return + return f, err } func (fp *filePipeline) Close() error { diff --git a/vendor/github.com/coreos/etcd/wal/wal.go b/vendor/github.com/coreos/etcd/wal/wal.go index 2cac25c1c9..96d01a23af 100644 --- a/vendor/github.com/coreos/etcd/wal/wal.go +++ b/vendor/github.com/coreos/etcd/wal/wal.go @@ -157,6 +157,48 @@ func Create(dirpath string, metadata []byte) (*WAL, error) { return w, nil } +func (w *WAL) renameWal(tmpdirpath string) (*WAL, error) { + if err := os.RemoveAll(w.dir); err != nil { + return nil, err + } + // On non-Windows platforms, hold the lock while renaming. Releasing + // the lock and trying to reacquire it quickly can be flaky because + // it's possible the process will fork to spawn a process while this is + // happening. The fds are set up as close-on-exec by the Go runtime, + // but there is a window between the fork and the exec where another + // process holds the lock. + if err := os.Rename(tmpdirpath, w.dir); err != nil { + if _, ok := err.(*os.LinkError); ok { + return w.renameWalUnlock(tmpdirpath) + } + return nil, err + } + w.fp = newFilePipeline(w.dir, SegmentSizeBytes) + df, err := fileutil.OpenDir(w.dir) + w.dirFile = df + return w, err +} + +func (w *WAL) renameWalUnlock(tmpdirpath string) (*WAL, error) { + // rename of directory with locked files doesn't work on windows/cifs; + // close the WAL to release the locks so the directory can be renamed. + plog.Infof("releasing file lock to rename %q to %q", tmpdirpath, w.dir) + w.Close() + if err := os.Rename(tmpdirpath, w.dir); err != nil { + return nil, err + } + // reopen and relock + newWAL, oerr := Open(w.dir, walpb.Snapshot{}) + if oerr != nil { + return nil, oerr + } + if _, _, _, err := newWAL.ReadAll(); err != nil { + newWAL.Close() + return nil, err + } + return newWAL, nil +} + // Open opens the WAL at the given snap. // The snap SHOULD have been previously saved to the WAL, or the following // ReadAll will fail. @@ -413,6 +455,7 @@ func (w *WAL) cut() error { return err } + // reopen newTail with its new path so calls to Name() match the wal filename format newTail.Close() if newTail, err = fileutil.LockFile(fpath, os.O_WRONLY, fileutil.PrivateFileMode); err != nil { @@ -460,6 +503,10 @@ func (w *WAL) ReleaseLockTo(index uint64) error { w.mu.Lock() defer w.mu.Unlock() + if len(w.locks) == 0 { + return nil + } + var smaller int found := false @@ -477,7 +524,7 @@ func (w *WAL) ReleaseLockTo(index uint64) error { // if no lock index is greater than the release index, we can // release lock up to the last one(excluding). - if !found && len(w.locks) != 0 { + if !found { smaller = len(w.locks) - 1 } diff --git a/vendor/github.com/coreos/etcd/wal/wal_unix.go b/vendor/github.com/coreos/etcd/wal/wal_unix.go deleted file mode 100644 index 82fd6a17a7..0000000000 --- a/vendor/github.com/coreos/etcd/wal/wal_unix.go +++ /dev/null @@ -1,44 +0,0 @@ -// Copyright 2016 The etcd Authors -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// +build !windows - -package wal - -import ( - "os" - - "github.com/coreos/etcd/pkg/fileutil" -) - -func (w *WAL) renameWal(tmpdirpath string) (*WAL, error) { - // On non-Windows platforms, hold the lock while renaming. Releasing - // the lock and trying to reacquire it quickly can be flaky because - // it's possible the process will fork to spawn a process while this is - // happening. The fds are set up as close-on-exec by the Go runtime, - // but there is a window between the fork and the exec where another - // process holds the lock. - - if err := os.RemoveAll(w.dir); err != nil { - return nil, err - } - if err := os.Rename(tmpdirpath, w.dir); err != nil { - return nil, err - } - - w.fp = newFilePipeline(w.dir, SegmentSizeBytes) - df, err := fileutil.OpenDir(w.dir) - w.dirFile = df - return w, err -} diff --git a/vendor/github.com/coreos/etcd/wal/wal_windows.go b/vendor/github.com/coreos/etcd/wal/wal_windows.go deleted file mode 100644 index 0b9e434cf5..0000000000 --- a/vendor/github.com/coreos/etcd/wal/wal_windows.go +++ /dev/null @@ -1,41 +0,0 @@ -// Copyright 2016 The etcd Authors -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -package wal - -import ( - "os" - - "github.com/coreos/etcd/wal/walpb" -) - -func (w *WAL) renameWal(tmpdirpath string) (*WAL, error) { - // rename of directory with locked files doesn't work on - // windows; close the WAL to release the locks so the directory - // can be renamed - w.Close() - if err := os.Rename(tmpdirpath, w.dir); err != nil { - return nil, err - } - // reopen and relock - newWAL, oerr := Open(w.dir, walpb.Snapshot{}) - if oerr != nil { - return nil, oerr - } - if _, _, _, err := newWAL.ReadAll(); err != nil { - newWAL.Close() - return nil, err - } - return newWAL, nil -} diff --git a/vendor/github.com/coreos/etcd/wal/walpb/record.pb.go b/vendor/github.com/coreos/etcd/wal/walpb/record.pb.go index 664fae1305..3ce63ddc2e 100644 --- a/vendor/github.com/coreos/etcd/wal/walpb/record.pb.go +++ b/vendor/github.com/coreos/etcd/wal/walpb/record.pb.go @@ -1,6 +1,5 @@ -// Code generated by protoc-gen-gogo. +// Code generated by protoc-gen-gogo. DO NOT EDIT. // source: record.proto -// DO NOT EDIT! /* Package walpb is a generated protocol buffer package. @@ -21,6 +20,8 @@ import ( math "math" + _ "github.com/gogo/protobuf/gogoproto" + io "io" ) @@ -122,24 +123,6 @@ func (m *Snapshot) MarshalTo(dAtA []byte) (int, error) { return i, nil } -func encodeFixed64Record(dAtA []byte, offset int, v uint64) int { - dAtA[offset] = uint8(v) - dAtA[offset+1] = uint8(v >> 8) - dAtA[offset+2] = uint8(v >> 16) - dAtA[offset+3] = uint8(v >> 24) - dAtA[offset+4] = uint8(v >> 32) - dAtA[offset+5] = uint8(v >> 40) - dAtA[offset+6] = uint8(v >> 48) - dAtA[offset+7] = uint8(v >> 56) - return offset + 8 -} -func encodeFixed32Record(dAtA []byte, offset int, v uint32) int { - dAtA[offset] = uint8(v) - dAtA[offset+1] = uint8(v >> 8) - dAtA[offset+2] = uint8(v >> 16) - dAtA[offset+3] = uint8(v >> 24) - return offset + 4 -} func encodeVarintRecord(dAtA []byte, offset int, v uint64) int { for v >= 1<<7 { dAtA[offset] = uint8(v&0x7f | 0x80) diff --git a/vendor/github.com/coreos/go-systemd/NOTICE b/vendor/github.com/coreos/go-systemd/NOTICE new file mode 100644 index 0000000000..23a0ada2fb --- /dev/null +++ b/vendor/github.com/coreos/go-systemd/NOTICE @@ -0,0 +1,5 @@ +CoreOS Project +Copyright 2018 CoreOS, Inc + +This product includes software developed at CoreOS, Inc. +(http://www.coreos.com/). diff --git a/vendor/github.com/coreos/go-systemd/README.md b/vendor/github.com/coreos/go-systemd/README.md index cb87a11245..cad04a8035 100644 --- a/vendor/github.com/coreos/go-systemd/README.md +++ b/vendor/github.com/coreos/go-systemd/README.md @@ -6,9 +6,11 @@ Go bindings to systemd. The project has several packages: - `activation` - for writing and using socket activation from Go +- `daemon` - for notifying systemd of service status changes - `dbus` - for starting/stopping/inspecting running services and units - `journal` - for writing to systemd's logging service, journald - `sdjournal` - for reading from journald by wrapping its C API +- `login1` - for integration with the systemd logind API - `machine1` - for registering machines/containers with systemd - `unit` - for (de)serialization and comparison of unit files @@ -18,10 +20,9 @@ An example HTTP server using socket activation can be quickly set up by followin https://github.com/coreos/go-systemd/tree/master/examples/activation/httpserver -## Journal +## systemd Service Notification -Using the pure-Go `journal` package you can submit journal entries directly to systemd's journal, taking advantage of features like indexed key/value pairs for each log entry. -The `sdjournal` package provides read access to the journal by wrapping around journald's native C API; consequently it requires cgo and the journal headers to be available. +The `daemon` package is an implementation of the [sd_notify protocol](https://www.freedesktop.org/software/systemd/man/sd_notify.html#Description). It can be used to inform systemd of service start-up completion, watchdog events, and other status changes. ## D-Bus @@ -45,6 +46,20 @@ Create `/etc/dbus-1/system-local.conf` that looks like this: ``` +## Journal + +### Writing to the Journal + +Using the pure-Go `journal` package you can submit journal entries directly to systemd's journal, taking advantage of features like indexed key/value pairs for each log entry. + +### Reading from the Journal + +The `sdjournal` package provides read access to the journal by wrapping around journald's native C API; consequently it requires cgo and the journal headers to be available. + +## logind + +The `login1` package provides functions to integrate with the [systemd logind API](http://www.freedesktop.org/wiki/Software/systemd/logind/). + ## machined The `machine1` package allows interaction with the [systemd machined D-Bus API](http://www.freedesktop.org/wiki/Software/systemd/machined/). diff --git a/vendor/github.com/coreos/go-systemd/journal/journal.go b/vendor/github.com/coreos/go-systemd/journal/journal.go index 7f434990d2..ef85a3ba24 100644 --- a/vendor/github.com/coreos/go-systemd/journal/journal.go +++ b/vendor/github.com/coreos/go-systemd/journal/journal.go @@ -103,7 +103,10 @@ func Send(message string, priority Priority, vars map[string]string) error { if !ok { return journalError("can't send file through non-Unix connection") } - unixConn.WriteMsgUnix([]byte{}, rights, nil) + _, _, err = unixConn.WriteMsgUnix([]byte{}, rights, nil) + if err != nil { + return journalError(err.Error()) + } } else if err != nil { return journalError(err.Error()) } @@ -165,7 +168,7 @@ func tempFd() (*os.File, error) { if err != nil { return nil, err } - syscall.Unlink(file.Name()) + err = syscall.Unlink(file.Name()) if err != nil { return nil, err }