NDRS-668 - block validator - validate blocks similar to block_proposer#709
Conversation
Fraser999
left a comment
There was a problem hiding this comment.
Nitpicks only from me.
This is a very welcome PR - I can comfortably throw 1000 transfers at a 20 node nctl network without any of the nodes peaking at above 80 MB memory usage, whereas without this PR I see nodes topping out at 7GB after less than 200 transfers. Nice work tracking down the issue.
| /// An event changing the current state to BlockValidatorReady, once the chainspec has been | ||
| /// loaded. | ||
| #[display(fmt = "block validator loaded")] | ||
| Loaded { chainspec: Arc<Chainspec> }, |
There was a problem hiding this comment.
We could also use the approach that @marc-casperlabs used in the BlockProposer – basically turning it into an actor-like component that loads its state prior to launch.
There was a problem hiding this comment.
I largely followed what had been done in the block_proposer - what improvements do you think could be made, or did I miss something?
| #[derive(Debug)] | ||
| pub(crate) struct BlockValidatorReady<T, I> { | ||
| /// Chainspec loaded for deploy validation. | ||
| chainspec: Arc<Chainspec>, |
There was a problem hiding this comment.
BlockValidator shouldn't need the whole chainspec, it has things like Genesis contracts (those are byte arrays). It should have a smaller, subset of configuration parameters that it needs. Can be ticketed and done separately.
There was a problem hiding this comment.
Another thing, this configuration will change based on the protocol version so it cannot be set in stone. Block Validator may have to use different configurations for block #11, and later block #12 and may even have to go back to the configuration from the block #11 if the messages arrive out of order (from other nodes, not internal ones).
There was a problem hiding this comment.
If I'm not mistaken, this is just an Arc reference to a shared value, so we aren't copying those byte arrays around. It seems like the same is done in the block_proposer as well.
There was a problem hiding this comment.
Might be, we should change it in both places then.
There was a problem hiding this comment.
Another thing, this configuration will change based on the protocol version so it cannot be set in stone. Block Validator may have to use different configurations for
block #11, and laterblock #12and may even have to go back to the configuration from theblock #11if the messages arrive out of order (from other nodes, not internal ones).
Yes, this should be handled when we work on chainspec upgrades.
There was a problem hiding this comment.
Is there a ticket for that so that we don't lose the track of it?
There was a problem hiding this comment.
I mentioned this to @Fraser999, he's working on designing our upgrade approach.
There was a problem hiding this comment.
Consensus has ProtocolConfig that uses a subset of the original Chainspec and is actually created from one.
There was a problem hiding this comment.
Side-note: The Arc is mostly there for transfering the chainspec around. I just did not feel like cloning it each time it was requested. I used a ProtocolConfig because that is what the code previously used, so I am currently neither for nor against keeping the actual Arc around :)
|
|
||
| // TODO: Clean this up to use `or_insert_with_key` once | ||
| // https://github.com/rust-lang/rust/issues/71024 is stabilized. | ||
| match self.validation_states.entry(block) { |
There was a problem hiding this comment.
How long does the block validator store the information about the block's validation state? Once a block is validated, is the entry removed from validation_states?
There was a problem hiding this comment.
It is only dropped if a block fails validation, and that behaviour existed before my changeset.
| .flat_map(|deploy_hash| { | ||
| let chainspec = Arc::clone(&chainspec); | ||
| // For every request, increase the number of in-flight... | ||
| in_flight.inc(deploy_hash); |
There was a problem hiding this comment.
in_flight uses a HashMap internally, this might end up doing a lot of hash-ing when we increment the number of in-flight requests for every deploy in a block.
| ) { | ||
| Event::DeployFound(deploy_hash) | ||
| } else { | ||
| Event::DeployMissing(deploy_hash) |
There was a problem hiding this comment.
I am worried that this is swallowing the real reason this block cannot be validated – it turns an invalid deploy into a missing one. It's probably fine for now , I just want to put it out there.
There was a problem hiding this comment.
True, after all the work of fetching the deploys, we are returning a bool. I would like to rethink a few aspects of the block_validator at some point - for instance, batching the deploy fetching.
There was a problem hiding this comment.
batching the deploy fetching.
Oh yes, definitely that 👍 💯
goral09
left a comment
There was a problem hiding this comment.
Looks good, no blockers from me.
|
bors r+ |
|
Build succeeded: |
| let deploy_valid = header.timestamp() + header.ttl() >= block_timestamp; | ||
| let num_deps_valid = header.dependencies().len() <= deploy_config.max_dependencies as usize; | ||
| ttl_valid && timestamp_valid && deploy_valid && num_deps_valid && all_deps_resolved() | ||
| header.is_valid(deploy_config, block_timestamp) && all_deps_resolved() |
There was a problem hiding this comment.
This seems to be self.deploy_config currently. Will this be upgradeable? Then we will need to use the deploy config version appropriate for that block height/era. (Is there a ticket for that?)
* Versioned `GasCosts` * Versioned `ConsensusParameters`ю Reduced default `MAX_SIZE` to be 110kb. Reduced default `MAX_CONTRACT_SIZE` to be 100kb. * Updated CHANGELOG.md * Versioned `FeeParameters` * Versioned `PredicateParameters`, `ScriptParameters`, `ContractParameters`, * Versioned `TxParameters` * Updated CHANGELOG.md * Make CI happy * Reshuffled fields `Script` and `Create` transactions to unify part used by all chargeable transactions * Updated CHANGELOG.md * Unified `Create` and `Script` logic via `ChargeableTransaction` * Updated CHANGELOG.md * Removed `bytecode_length` from the `Create` transaction * Removed not actual test * Fixed tests related to offsets
With this PR I started out trying to solve NDRS-668:
That should have been it, but it turned out that
BlockValidatorwas the source of many extra events on the queue:The
BlockValidatorwould always fetch all deploys when aBlockValidationRequestwas received by it, regardless of if it has already validated it, or if there was a previous request in-flight for that block. This PR therefore also implements fetching the deploys for a block if it has not been successfully validated in the past. If it has been successfully validated, we can exit early, returning the validation to the caller.I also add a
Debugdump of the queue items to file (piggybacking on the already written json dump) where you can send a SIGUSR1 signal to get a dump inqueue_dump_debug.txt, which can aid in debugging the queue. There are hashes present in this debug dump, making it possible to correlate requests/responses and other messages in general.