Rehash retrieval proxies / multiple transports / inconsistent source attack protection (aka "smart retrievals")#2241
Conversation
|
This is still a draft because tests need yet to be fixed, and I want to improve some other parts. |
node/src/actors/rad_manager/mod.rs
Outdated
| pub struct RadManager; | ||
| pub struct RadManager { | ||
| /// Addresses of proxies to be used as extra transports when performing data retrieval. | ||
| proxies: Vec<String>, |
There was a problem hiding this comment.
Proxy addresses are only validated when executing a data request, so if the user inputs an invalid proxy they have no way to detect it until their node can resolve a data request.
So it would be nice to have either a way to validate the proxies on startup, or a JSON-RPC command to test the execution of a data request using proxies. If the old "try data request" command uses these proxies we can refer to it in the documentation as the expected way to ensure that the node is configured correctly.
There was a problem hiding this comment.
So it would be nice to have either a way to validate the proxies on startup
Let me give this a further thought on how to approach it properly.
or a JSON-RPC command to test the execution of a data request using proxies. If the old "try data request" command uses these proxies we can refer to it in the documentation as the expected way to ensure that the node is configured correctly.
I think I prefer a more automated way 🤔
In any case, try_data_request is not currently using the proxies (as it implements its own retrieval logic that is not unified with that of RadManager). I guess it would be nice to have it use the proxies though, for it to be consistent with the actual witnessing behavior of the node and, as you pointed out, as a funny way to test the proxies functionality.
There was a problem hiding this comment.
@tmpolaczyk which of these do you think is the right approach to performing validation of proxies upon startup?
- Simply validating the addresses
- Checking that we can connect to a socket at those addresses
- Somehow handshaking with the proxy (different for each supported proxy protocol)
- Running
try_data_requestand expecting a success
Specially in this last case, I think I would give the option of disabling the validation through configuration, so a downtime of the data sources doesn't prevent Witnet nodes from (re)starting.
There was a problem hiding this comment.
Validating the addresses should be done to detect typos. Checking that we can connect or handshaking may not always work, so I prefer a more end to end test.
When running try_data_request from RadManager it is really hard to guess which proxy failed, the logs never mention the proxy url, so we need to run try_data_request once per proxy, and add a log that clearly says if a proxy failed (and stop the node in case of failure?). Creating a test data request to google.com should work. I guess we can simply add a config variable like witnessing.http_test_url = "https://google.com", and allow false to disable that check.
There was a problem hiding this comment.
Basic URL validation functionality is now in place:
$ RUST_LOG=witnet_rad=debug cargo run --release -- -c ~/.witnet/config/witnet.toml wallet server
Loading config from: witnet.toml
Setting log level to: DEBUG, source: Env
[2022-08-11T17:49:25Z INFO witnet::cli] Sentry telemetry enabled
[2022-08-11T17:49:25Z DEBUG witnet_data_structures] Set environment to mainnet
[2022-08-11T17:49:25Z WARN witnet_wallet] The local Witnet node JSON-RPC server is configured to listen at 0.0.0.0:21338 but the wallet will connect to 127.0.0.1:21338
[2022-08-11T17:49:26Z INFO witnet_config::config] The default unproxied HTTP transport for retrieval is enabled.
[2022-08-11T17:49:26Z INFO witnet_config::config] Configuring retrieval proxies: ["socks5://127.0.0.1", "whatever://192.111.135.17:18302", "laleche"]
[2022-08-11T17:49:26Z INFO witnet_config::config] Paranoid retrieval percentage is set to 51%
Error: The following transport addresses are invalid:
- socks5://127.0.0.1 (the address is missing a port number)
- whatever://192.111.135.17:18302 ("whatever" is not a supported type of transport)
- laleche (relative URL without a base)
|
Thanks @tmpolaczyk for all your suggestions. I'll try to address them all today. |
32b3307 to
c0c65f8
Compare
| log::warn!("Refraining not to commit to data request {} because the sources are apparently inconsistent", dr_pointer); | ||
| return Err(()) |
There was a problem hiding this comment.
I believe in this case it would be better for the network safety to commit the RadError::InconsistentSource than to refrain from commiting.
In #2050 we discuss possible attacks that arise from data requests that can only be solved by a small fraction of all the nodes. I believe the same arguments apply to the inconsistent source attack, because nodes that enable this inconsistent source attack protection will not participate in that requests.
Imagine an attacker that creates a data request that returns a bad value for some nodes in order to penalize them, the classical inconsistent source attack. If some nodes refrain from participating in that request, then the attacker will control a higher percentage of the quorum, leading to the attacker gaining more reputation. This is true until a majority of nodes enable inconsistent source attack protection, then the request will be resolved with a RadError::InsufficientCommits.
On the other hand, if the eligible nodes commit an error, then all the nodes will participate in solving this request, so it will be equally hard for the attacker to create an eligibility proof for this request than for any other request. If the majority of nodes have enabled inconsistent source attack protection, it is possible that the data request is solved with a radon error as a result, and in that case nobody gains reputation. And in the case where only a minority have enabled inconsistent source attack protection, there will be a few error commits in the data request, so the attack should still work, with a slightly lower profit.
There was a problem hiding this comment.
Hmmm... let me give that a thought. This is not a trivial point because what you are suggesting actually requires a protocol change to assign an identifier for InconsistentSource in RadonErrors.
There was a problem hiding this comment.
Of course I had already created such identifier, but this was a non-consensus-breaking change only because we are never putting that error into a commitment.
There was a problem hiding this comment.
True, in that case we could commit a RadError::Unknown, and in a future pull request add the logic to enable a new RadonError using TAPI.
There was a problem hiding this comment.
I have solved it like this:
.map_err(
// The `InconsistentSource` error is mapped here for the sake of backwards
// compatibility. Namely, to enable paranoid retrieval without having to
// immediately introduce a breaking change that may jeopardize oracle
// queries. The point of making the mapping here is to only affect actual
// witnessing and committing, but not the `try_data_request` function, which
// can rather use the `InconsistentSource` error for clarity.
// TODO: pursue a WIP that introduces `InconsistentSource` as a proper
// RadonError at the protocol level.
|err| match err {
RadError::InconsistentSource => RadError::Unknown,
other => other,
},
)There was a problem hiding this comment.
Interesting. As nodes are already refraining from committing in such cases... am I right to assume that we can change this part and start committing RadonError::Unknown in this "everything went wrong" case without breaking consensus?
In that case, we would start to see reveals with RadonError::Unknown, which currently are rare if not non-existing.
The thing is that the lack of commits caused by the code block you are pointing out is probably not even having an impact on oracle queries as of today, because those missing commits should be allowing other valid commits to take their place, under the assumption that there are always more eligible identities than necessary.
If that's the case, and we start to default to committing an error, that would most times lead to worse tallies than not doing so, because miners can't tell errors from values while at the commitment stage.
There was a problem hiding this comment.
I right to assume that we can change this part and start committing RadonError::Unknown in this "everything went wrong" case without breaking consensus?
Should be safe, because nodes can always commit an error. So the result would be the same as a sudden increase in http errors or timeout errors.
If that's the case, and we start to default to committing an error, that would most times lead to worse tallies than not doing so, because miners can't tell errors from values while at the commitment stage.
True, then I guess the right approach would be to keep the old behavior for now, and investigate about the consequences of changing it in the future. Would the ability to distinguish values from errors in the commit stage be useful? Or would it introduce even more attack vectors?
There was a problem hiding this comment.
Would the ability to distinguish values from errors in the commit stage be useful? Or would it introduce even more attack vectors?
Idk, leaking that info at the commitment stage makes it to easy in theory for lazy witnesses not to perform the retrieval and simply copy the error from other commitments once they have seen rf * consensus_requirement of those (because at that point they will know in advance that the only value they can commit without being slashed must be an error).
There was a problem hiding this comment.
In the other hand, that would allow to skip the reveal phase 🤔
There was a problem hiding this comment.
I was thinking about adding a flag to the commit like is_error: bool, this way lazy witnesses can only guess the error. And I believe if your error is different from the consensus error you miss the reward, or something like that, no sure because that logic is complex. That would allow miners to choose non-error commits, but it will also allow miners to choose error commits to sabotage a data request.
|
I just found a very funny issue, this PR breaks RNG requests because the nodes commit a "ModeTie" error instead of a random value. |
|
When trying to use a public socks5 proxy, I get this error when trying a request with https sources (didn't test http): witnet.toml: I tried other proxies and I got either the same error or a "retrive timeout" error, so not sure if I had bad luck choosing the proxies or it doesn't work as expected for some reason. |
I tried a few proxy addresses as well and found out that most of those that you can find out there are garbage, regardless of being SOCKS or HTTP. The only one that worked like a charm was the SOCKS5 interface on port 9150 from my own Tor Browser. This added like 1-2s of latency but still all the retrievals were successful (except one API that gave me 403 because they're behind a WAF or something). |
That seems to be true. I finally got it working by running a local socks server. |
7d65860 to
1a2168d
Compare
tmpolaczyk
left a comment
There was a problem hiding this comment.
Some comments for the WitnetHttpClient.
7fa65f4 to
a3c7ee6
Compare
a3c7ee6 to
c59a11b
Compare
node/tests/actors/rad_manager.rs
Outdated
| // This is dropping RadManager explicitly so as to get rid of the panic of doing so. | ||
| std::panic::catch_unwind(|| { | ||
| std::mem::drop(manager); | ||
| }) | ||
| .ok(); |
There was a problem hiding this comment.
Ideally this shouldn't be needed, I believe there is a bug in the stop_system_if_panicking function, which panics if the actor is being dropped in a panicking thread while no actix system is running. When testing other actors they always run inside an actix system so that's not a problem, but the RadManager is more standalone. So I believe that running inside an actix system should be optional, to avoid doing this workaround in other code that doesn't use actix. I'll see if it can be fixed.
There was a problem hiding this comment.
So this hack can be removed now?
dd55fd9 to
b6a95be
Compare
|
Proxy URLs are validated at startup, but there is no connection attempt to ensure that they work. There is no easy way to manually check that proxies work:
So unless I missed something there is no way to check if the configured proxies work, it would be nice to have something like a test-request CLI command. In the node, when one proxy fails with a timeout error, this is the only log: It would be nice if the error message could mention the URL of the proxy that failed. |
In my mind I had implemented full validation (by querying some well-known URL). But it looks like that happened only in my mind. |
Yes, it could be a separate PR. I suggested it here in case that makes us rethink the configuration parameters or something, but the basic structure of the witnessing params looks solid enough. Then if I haven't missed any breaking changes this is ready to merge. |
…ansport as suggested by @tmpolaczyk
Made possible through reexporting `surf` was `witnet_net::surf`. As suggested by @tmpolaczyk
Also: - Fix bug about detecting liars - Add tests for `evaluate_paranoid_retrieval` - Improve comments and documentation - Fix usage of tally logic instead of aggregation logic when acting paranoid As suggested by @tmpolaczyk. Thanks!
…mitting as suggested by @tmpolaczyk TODO: pursue a WIP that introduces `InconsistentSource` as a proper RadonError at the protocol level.
This is only validating that they are well-formed URLs and they are using a supported protocol (http, https, socks4 or socks5)
So as to prevent parsing and validating them every time they get used. As suggested by @tmpolaczyk
6c5fabb to
6a443bb
Compare
This is a rehash of #1523
Enables using proxies (HTTP, HTTPS, SOCKS4 and SOCKS5) when performing data retrieval. This allows retrieving data sources through different transports so as to ensure that the data sources are consistent and we are taking as small of a risk as
possible when committing to specially crafted data requests that may be potentially ill-intended.
Just add this to the
[connections]section ofwitnet.tomlto proxy your retrievals through 3 different open proxies:These addresses have been taken from this public list of proxies.
Close #1274
Close #1511