Skip to content

NDRS-668 - block validator - validate blocks similar to block_proposer#709

Merged
bors[bot] merged 6 commits intocasper-network:masterfrom
dwerner:NDRS-668-block-validator-validation
Jan 6, 2021
Merged

NDRS-668 - block validator - validate blocks similar to block_proposer#709
bors[bot] merged 6 commits intocasper-network:masterfrom
dwerner:NDRS-668-block-validator-validation

Conversation

@dwerner
Copy link
Contributor

@dwerner dwerner commented Jan 5, 2021

With this PR I started out trying to solve NDRS-668:

  1. add a state machine to block_validator that loads the chainspec from storage (needed to validate deploys).
  2. validate deploys against chainspec values

That should have been it, but it turned out that BlockValidator was the source of many extra events on the queue:

The BlockValidator would always fetch all deploys when a BlockValidationRequest was received by it, regardless of if it has already validated it, or if there was a previous request in-flight for that block. This PR therefore also implements fetching the deploys for a block if it has not been successfully validated in the past. If it has been successfully validated, we can exit early, returning the validation to the caller.

I also add a Debug dump of the queue items to file (piggybacking on the already written json dump) where you can send a SIGUSR1 signal to get a dump in queue_dump_debug.txt, which can aid in debugging the queue. There are hashes present in this debug dump, making it possible to correlate requests/responses and other messages in general.

Copy link
Contributor

@Fraser999 Fraser999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicks only from me.

This is a very welcome PR - I can comfortably throw 1000 transfers at a 20 node nctl network without any of the nodes peaking at above 80 MB memory usage, whereas without this PR I see nodes topping out at 7GB after less than 200 transfers. Nice work tracking down the issue.

/// An event changing the current state to BlockValidatorReady, once the chainspec has been
/// loaded.
#[display(fmt = "block validator loaded")]
Loaded { chainspec: Arc<Chainspec> },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also use the approach that @marc-casperlabs used in the BlockProposer – basically turning it into an actor-like component that loads its state prior to launch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I largely followed what had been done in the block_proposer - what improvements do you think could be made, or did I miss something?

#[derive(Debug)]
pub(crate) struct BlockValidatorReady<T, I> {
/// Chainspec loaded for deploy validation.
chainspec: Arc<Chainspec>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BlockValidator shouldn't need the whole chainspec, it has things like Genesis contracts (those are byte arrays). It should have a smaller, subset of configuration parameters that it needs. Can be ticketed and done separately.

Copy link
Contributor

@goral09 goral09 Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing, this configuration will change based on the protocol version so it cannot be set in stone. Block Validator may have to use different configurations for block #11, and later block #12 and may even have to go back to the configuration from the block #11 if the messages arrive out of order (from other nodes, not internal ones).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, this is just an Arc reference to a shared value, so we aren't copying those byte arrays around. It seems like the same is done in the block_proposer as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be, we should change it in both places then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing, this configuration will change based on the protocol version so it cannot be set in stone. Block Validator may have to use different configurations for block #11, and later block #12 and may even have to go back to the configuration from the block #11 if the messages arrive out of order (from other nodes, not internal ones).

Yes, this should be handled when we work on chainspec upgrades.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a ticket for that so that we don't lose the track of it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned this to @Fraser999, he's working on designing our upgrade approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consensus has ProtocolConfig that uses a subset of the original Chainspec and is actually created from one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side-note: The Arc is mostly there for transfering the chainspec around. I just did not feel like cloning it each time it was requested. I used a ProtocolConfig because that is what the code previously used, so I am currently neither for nor against keeping the actual Arc around :)


// TODO: Clean this up to use `or_insert_with_key` once
// https://github.com/rust-lang/rust/issues/71024 is stabilized.
match self.validation_states.entry(block) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long does the block validator store the information about the block's validation state? Once a block is validated, is the entry removed from validation_states?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is only dropped if a block fails validation, and that behaviour existed before my changeset.

.flat_map(|deploy_hash| {
let chainspec = Arc::clone(&chainspec);
// For every request, increase the number of in-flight...
in_flight.inc(deploy_hash);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in_flight uses a HashMap internally, this might end up doing a lot of hash-ing when we increment the number of in-flight requests for every deploy in a block.

) {
Event::DeployFound(deploy_hash)
} else {
Event::DeployMissing(deploy_hash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried that this is swallowing the real reason this block cannot be validated – it turns an invalid deploy into a missing one. It's probably fine for now , I just want to put it out there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, after all the work of fetching the deploys, we are returning a bool. I would like to rethink a few aspects of the block_validator at some point - for instance, batching the deploy fetching.

Copy link
Contributor

@goral09 goral09 Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batching the deploy fetching.

Oh yes, definitely that 👍 💯

Copy link
Contributor

@goral09 goral09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, no blockers from me.

@dwerner
Copy link
Contributor Author

dwerner commented Jan 6, 2021

bors r+

@bors
Copy link
Contributor

bors bot commented Jan 6, 2021

Build succeeded:

@bors bors bot merged commit 8b14931 into casper-network:master Jan 6, 2021
let deploy_valid = header.timestamp() + header.ttl() >= block_timestamp;
let num_deps_valid = header.dependencies().len() <= deploy_config.max_dependencies as usize;
ttl_valid && timestamp_valid && deploy_valid && num_deps_valid && all_deps_resolved()
header.is_valid(deploy_config, block_timestamp) && all_deps_resolved()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be self.deploy_config currently. Will this be upgradeable? Then we will need to use the deploy config version appropriate for that block height/era. (Is there a ticket for that?)

rafal-ch pushed a commit that referenced this pull request Sep 11, 2024
* Versioned `GasCosts`

* Versioned `ConsensusParameters`ю
Reduced default `MAX_SIZE` to be 110kb.
Reduced default `MAX_CONTRACT_SIZE` to be 100kb.

* Updated CHANGELOG.md

* Versioned `FeeParameters`

* Versioned `PredicateParameters`, `ScriptParameters`, `ContractParameters`,

* Versioned `TxParameters`

* Updated CHANGELOG.md

* Make CI happy

* Reshuffled fields `Script` and `Create` transactions to unify part used by all chargeable transactions

* Updated CHANGELOG.md

* Unified `Create` and `Script` logic via `ChargeableTransaction`

* Updated CHANGELOG.md

* Removed `bytecode_length` from the `Create` transaction

* Removed not actual test

* Fixed tests related to offsets
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants