-
Notifications
You must be signed in to change notification settings - Fork 108
MF PoS Aggregate Changes (2023-06-05) #545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add default epoch entry.
* Allow ParamUpdater to update PoS GlobalParams. * Pass SnapshotAtEpochNumber to global param getters. * Add TODOs to retrieve snapshot values. * Add nil check for extra data param.
…-hook Snapshot GlobalParams, ValidatorSet, GlobalActiveStake, and LeaderSchedule
Mf/jail inactive validators
…obal-params Fix bugs in updating global params.
…-string-util Add validator status ToString util.
…-signature Rename VotingPublicKeySignature to VotingAuthorization.
…-params-entry-defaults Refactor merging GlobalParamsEntry defaults.
| // PrefixCurrentRandomSeedHash: Retrieve the current RandomSeedHash. | ||
| // Prefix -> <RandomSeedHash [32]byte>. | ||
| PrefixCurrentRandomSeedHash []byte `prefix_id:"[84]" is_state:"true"` | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally recommend including the indended types of the key components in the comment. I added them this time to show an example. This not only helpful for future readers, but also for potential future debugging. For example, I didn't realize that SnapshotAtEpochNumber was a uvarint rather than a fixed-width int, but the comment here immediately makes that clear now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do want our SnapshotAtEpochNumber (in the db keys) to be a fixed-with uint64. I'm making that update. But point taken, it's also good to include types in these comments. Will add.
| if currentEpochEntry == nil { | ||
| return false, errors.New("IsEpochComplete: CurrentEpochEntry is nil, this should never happen") | ||
| } | ||
| return currentEpochEntry.FinalBlockHeight == blockHeight, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first time this is run, I think there will be no CurrentEpochEntry in the db. This will cause bav.GetCurrentEpochEntry() to return an epoch entry with math.MaxUint64 as the block height, which will return false here. Is the epoch entry being set somewhere that I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an if statement a few lines above this that catches this edge case:
if blockHeight == uint64(bav.Params.ForkHeights.ProofOfStake1StateSetupBlockHeight) {
// As soon as we enable snapshotting for the first time, we should run the OnEpochCompleteHook.
return true, nil
}
We want to run our first OnEpochCompleteHook, immediately following the ProofOfStake1StateSetupBlockHeight. The current (default) EpochEntry will be EpochNumber: 0, FinalBlockHeight: MaxUint64, but that FinalBlockHeight is ignored. We run the first OnEpochCompleteHook anyways. After that runs, our first "real" epoch starts with EpochNumber: 1, FinalBlockHeight: ProofOfStake1StateSetupBlockHeight + params.EpochDurationNumBlocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got it. That makes sense. I added a comment to clarify this in RunEpochCompleteHook.
| } | ||
|
|
||
| // Snapshot the current GlobalActiveStakeAmountNanos. | ||
| globalActiveStakeAmountNanos, err := bav.GetGlobalActiveStakeAmountNanos() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we update our code to only snapshot ConsensusMaxNumValidators, we will want to get rid of this and replace it with a sum over all the validators. But it's OK to leave it for now.
| bav._setSnapshotValidatorEntry(validatorEntry, snapshotAtEpochNumber) | ||
|
|
||
| // Check if we should jail the validator. | ||
| shouldJailValidator, err := bav.ShouldJailValidator(validatorEntry, blockHeight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like jailing validators in the snapshotting code is a bit overloaded. I was imagining something more like the following, which un-comingles the jailing from the snapshotting.
func RunOnEpochCompeted() {
// Compute isLastBlockInEpoch. Return if it's not.
// Jail validators that haven't been active
// Pay all the top stakers (sofonias will handle this)
// Snapshot validators
// Snapshot leader schedule
// Snapshot stakers (sofonias will augment it to do this)
// Snapshot GlobalParams
// Update CurrentEpochEntry
}
There is another issue which is whether this ORDER of "jailing" -> "paying previously snapshotted people" -> "snapshotting new people" is "correct." I played with it a bit, and I think jailing first makes sense. But we also have to think of it in the context of validators joining and leaving the "active set" in the event that we introduce a ConsensusMaxNumValidators. In particular, if we only allow votes by the top 100 validators by stake, you could imagine a situation where validator 101 doesn't vote because he's not in the top set, but then his stake increases and he IS in the top set BUT hasn't voted. I think he actually woudln't be jailed in this case because he'll vote during the first epoch when he joins the active set, which will show that he's been active. But I'm just writing it out to illustrate the potential edge case of someone joining the active set.
But sofonias will be the one dealing with this anyway.
| if err != nil { | ||
| return 0, errors.Wrapf(err, "GetSnapshotEpochNumber: problem retrieving CurrentEpochNumber: ") | ||
| } | ||
| if currentEpochNumber < SnapshotLookbackNumEpochs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a comment here. It's a bit odd to be calling this function before we have run 2 snapshots. I'd almost prefer to error if we can do it in a way that doesn't break things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to return 0. We start snapshotting with our StateSetup block height, so we should have the correct number of snapshots and not hit this case once we hit the ConsensusCutover block height. This case will only be hit immediately following the StateSetup block height. We run one OnEpochCompleteHook right away on the StateSetup block height which will increment our CurrentEpochNumber from zero (the starting default) to one. Then we wait one epoch and run our second OnEpochCompleteHook to increment our CurrentEpochNumber from one to two. At this point, we will have the correct number of snapshots and no longer hit this edge case.
The problem is what about snapshot values we need to use in that first block where CurrentBlockHeight = StateSetup block height and then the first epoch after that? The only snapshot values that we use relate to our new PoS txn types. We pull the snapshot GlobalParamsEntry to retrieve the StakeLockupEpochDuration and the ValidatorJailEpochDuration. Both of these impact the new txn types which are unlocked after the StateSetup block height. The ValidatorJailEpochDuration value doesn't really matter since no validators will be jailed until the ConsensusCutover block height. For the StakeLockupEpochDuration (and all other snapshot GlobalParamsEntry values), if there is no snapshot value, we return a GlobalParamsEntry with just our defaults, which is what we intend.
I will add all of this as a comment.
| } | ||
|
|
||
| func (bav *UtxoView) _flushSnapshotValidatorEntriesToDbWithTxn(txn *badger.Txn, blockHeight uint64) error { | ||
| for mapKey, validatorEntry := range bav.SnapshotValidatorEntries { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a bug here. You need to first DELETE all the entries then PUT all the entries. I think the code as it's written now will have messed up indexes after a few flushes. Consider the following example:
- PUT (epoch=1, validatorpkid=validator1, stakeAmount=5)
- Creates 5->validator1
- UPDATE (epoch=1, validatorpkid=validator1, stakeAmount=6)
- Creates 6->validator1 WITHOUT deleting 5->validator1
In order to hit this bug, you'd need to do two snapshot flushes within the same epoch which I don't think can happen. But adding the delete makes it bullet-proof and additionally makes it conform to all our other flushes. I would do this for all of them.
Also when you add the DELETE, I'd do it by (epoch, pkid) so that you just look up all the entries that correspond to that validator in that snapshot and kill them before you do a PUT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to hit this bug, you'd need to do two snapshot flushes within the same epoch which I don't think can happen.
That's right. That should never happen. Snapshot values should be immutable once they're set in the OnEpochCompleteHook. For that reason, I didn't add delete functionality. IMO it's wasted db ops. But I will add for consistency. Easy enough to rip out later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A snapshot validator's TotalStakeAmountNanos should never change. But I did add this delete then set pattern.
| mapKey.SnapshotAtEpochNumber, | ||
| ) | ||
| } | ||
| if err := DBPutSnapshotLeaderScheduleValidatorWithTxn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here again I would follow the pattern of DELETE -> PUT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key-value here is SnapshotAtEpochNumber uint64, LeaderIndex uint16 --> ValidatorPKID *PKID. IMO there isn't a convenient place to store the isDeleted field. IMO we shouldn't delete the leader schedule rows, only overwrite. Also you may be suggesting we do an O(n^2) operation to make sure the ValidatorPKID doesn't show up elsewhere in the leader schedule when we set it (otherwise delete it), but IMO not worth it.
The only edge case that could happen here is if we update the leader schedule length and one is 100, the next is 99. There would be one validator left-over from the first leader schedule in the 100th slot. But 1) we shouldn't ever be overwriting a leader schedule once set, and 2) a change to the "how many validators do we put in a leader schedule" param wouldn't happen within the same epoch.
This PR includes: