Skip to content

Conversation

@mattfoley8
Copy link
Contributor

@mattfoley8 mattfoley8 commented Jun 5, 2023

This PR includes:

  • Setting new PoS params on the GlobalParamsEntry and allowing them to be updated by the ParamUpdater.
  • Returning a default EpochEntry if not set. This will be an edge-case we hit when we first flip on the new PoS txn types and before we run the first OnEpochCompleteHook.
  • Consolidating + simplifying the two PoS block heights: StateSetup + ConsensusCutover.
  • Snapshotting global params, the validator set, and leader schedule.
  • Jailing inactive validators in the OnEpochCompleteHook.

mattfoley8 and others added 29 commits May 16, 2023 14:03
* Allow ParamUpdater to update PoS GlobalParams.

* Pass SnapshotAtEpochNumber to global param getters.

* Add TODOs to retrieve snapshot values.

* Add nil check for extra data param.
@mattfoley8 mattfoley8 requested a review from lazynina June 5, 2023 15:25
@mattfoley8 mattfoley8 changed the title Mf/pos merge 20230605 MF PoS Aggregate Changes (2023-06-05) Jun 8, 2023
// PrefixCurrentRandomSeedHash: Retrieve the current RandomSeedHash.
// Prefix -> <RandomSeedHash [32]byte>.
PrefixCurrentRandomSeedHash []byte `prefix_id:"[84]" is_state:"true"`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally recommend including the indended types of the key components in the comment. I added them this time to show an example. This not only helpful for future readers, but also for potential future debugging. For example, I didn't realize that SnapshotAtEpochNumber was a uvarint rather than a fixed-width int, but the comment here immediately makes that clear now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do want our SnapshotAtEpochNumber (in the db keys) to be a fixed-with uint64. I'm making that update. But point taken, it's also good to include types in these comments. Will add.

if currentEpochEntry == nil {
return false, errors.New("IsEpochComplete: CurrentEpochEntry is nil, this should never happen")
}
return currentEpochEntry.FinalBlockHeight == blockHeight, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first time this is run, I think there will be no CurrentEpochEntry in the db. This will cause bav.GetCurrentEpochEntry() to return an epoch entry with math.MaxUint64 as the block height, which will return false here. Is the epoch entry being set somewhere that I'm missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an if statement a few lines above this that catches this edge case:

	if blockHeight == uint64(bav.Params.ForkHeights.ProofOfStake1StateSetupBlockHeight) {
		// As soon as we enable snapshotting for the first time, we should run the OnEpochCompleteHook.
		return true, nil
	}

We want to run our first OnEpochCompleteHook, immediately following the ProofOfStake1StateSetupBlockHeight. The current (default) EpochEntry will be EpochNumber: 0, FinalBlockHeight: MaxUint64, but that FinalBlockHeight is ignored. We run the first OnEpochCompleteHook anyways. After that runs, our first "real" epoch starts with EpochNumber: 1, FinalBlockHeight: ProofOfStake1StateSetupBlockHeight + params.EpochDurationNumBlocks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it. That makes sense. I added a comment to clarify this in RunEpochCompleteHook.

}

// Snapshot the current GlobalActiveStakeAmountNanos.
globalActiveStakeAmountNanos, err := bav.GetGlobalActiveStakeAmountNanos()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we update our code to only snapshot ConsensusMaxNumValidators, we will want to get rid of this and replace it with a sum over all the validators. But it's OK to leave it for now.

bav._setSnapshotValidatorEntry(validatorEntry, snapshotAtEpochNumber)

// Check if we should jail the validator.
shouldJailValidator, err := bav.ShouldJailValidator(validatorEntry, blockHeight)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like jailing validators in the snapshotting code is a bit overloaded. I was imagining something more like the following, which un-comingles the jailing from the snapshotting.

func RunOnEpochCompeted() {
    // Compute isLastBlockInEpoch. Return if it's not.
    // Jail validators that haven't been active
    // Pay all the top stakers (sofonias will handle this)
    // Snapshot validators
    // Snapshot leader schedule
    // Snapshot stakers (sofonias will augment it to do this)
    // Snapshot GlobalParams
    // Update CurrentEpochEntry
}

There is another issue which is whether this ORDER of "jailing" -> "paying previously snapshotted people" -> "snapshotting new people" is "correct." I played with it a bit, and I think jailing first makes sense. But we also have to think of it in the context of validators joining and leaving the "active set" in the event that we introduce a ConsensusMaxNumValidators. In particular, if we only allow votes by the top 100 validators by stake, you could imagine a situation where validator 101 doesn't vote because he's not in the top set, but then his stake increases and he IS in the top set BUT hasn't voted. I think he actually woudln't be jailed in this case because he'll vote during the first epoch when he joins the active set, which will show that he's been active. But I'm just writing it out to illustrate the potential edge case of someone joining the active set.

But sofonias will be the one dealing with this anyway.

if err != nil {
return 0, errors.Wrapf(err, "GetSnapshotEpochNumber: problem retrieving CurrentEpochNumber: ")
}
if currentEpochNumber < SnapshotLookbackNumEpochs {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a comment here. It's a bit odd to be calling this function before we have run 2 snapshots. I'd almost prefer to error if we can do it in a way that doesn't break things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to return 0. We start snapshotting with our StateSetup block height, so we should have the correct number of snapshots and not hit this case once we hit the ConsensusCutover block height. This case will only be hit immediately following the StateSetup block height. We run one OnEpochCompleteHook right away on the StateSetup block height which will increment our CurrentEpochNumber from zero (the starting default) to one. Then we wait one epoch and run our second OnEpochCompleteHook to increment our CurrentEpochNumber from one to two. At this point, we will have the correct number of snapshots and no longer hit this edge case.

The problem is what about snapshot values we need to use in that first block where CurrentBlockHeight = StateSetup block height and then the first epoch after that? The only snapshot values that we use relate to our new PoS txn types. We pull the snapshot GlobalParamsEntry to retrieve the StakeLockupEpochDuration and the ValidatorJailEpochDuration. Both of these impact the new txn types which are unlocked after the StateSetup block height. The ValidatorJailEpochDuration value doesn't really matter since no validators will be jailed until the ConsensusCutover block height. For the StakeLockupEpochDuration (and all other snapshot GlobalParamsEntry values), if there is no snapshot value, we return a GlobalParamsEntry with just our defaults, which is what we intend.

I will add all of this as a comment.

}

func (bav *UtxoView) _flushSnapshotValidatorEntriesToDbWithTxn(txn *badger.Txn, blockHeight uint64) error {
for mapKey, validatorEntry := range bav.SnapshotValidatorEntries {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a bug here. You need to first DELETE all the entries then PUT all the entries. I think the code as it's written now will have messed up indexes after a few flushes. Consider the following example:

  • PUT (epoch=1, validatorpkid=validator1, stakeAmount=5)
    • Creates 5->validator1
  • UPDATE (epoch=1, validatorpkid=validator1, stakeAmount=6)
    • Creates 6->validator1 WITHOUT deleting 5->validator1

In order to hit this bug, you'd need to do two snapshot flushes within the same epoch which I don't think can happen. But adding the delete makes it bullet-proof and additionally makes it conform to all our other flushes. I would do this for all of them.

Also when you add the DELETE, I'd do it by (epoch, pkid) so that you just look up all the entries that correspond to that validator in that snapshot and kill them before you do a PUT.

Copy link
Contributor Author

@mattfoley8 mattfoley8 Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to hit this bug, you'd need to do two snapshot flushes within the same epoch which I don't think can happen.

That's right. That should never happen. Snapshot values should be immutable once they're set in the OnEpochCompleteHook. For that reason, I didn't add delete functionality. IMO it's wasted db ops. But I will add for consistency. Easy enough to rip out later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A snapshot validator's TotalStakeAmountNanos should never change. But I did add this delete then set pattern.

mapKey.SnapshotAtEpochNumber,
)
}
if err := DBPutSnapshotLeaderScheduleValidatorWithTxn(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here again I would follow the pattern of DELETE -> PUT

Copy link
Contributor Author

@mattfoley8 mattfoley8 Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key-value here is SnapshotAtEpochNumber uint64, LeaderIndex uint16 --> ValidatorPKID *PKID. IMO there isn't a convenient place to store the isDeleted field. IMO we shouldn't delete the leader schedule rows, only overwrite. Also you may be suggesting we do an O(n^2) operation to make sure the ValidatorPKID doesn't show up elsewhere in the leader schedule when we set it (otherwise delete it), but IMO not worth it.

The only edge case that could happen here is if we update the leader schedule length and one is 100, the next is 99. There would be one validator left-over from the first leader schedule in the 100th slot. But 1) we shouldn't ever be overwriting a leader schedule once set, and 2) a change to the "how many validators do we put in a leader schedule" param wouldn't happen within the same epoch.

@mattfoley8 mattfoley8 merged commit a7df1ca into feature/proof-of-stake Jun 26, 2023
@mattfoley8 mattfoley8 deleted the mf/pos-merge-20230605 branch June 26, 2023 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants