Skip to content

Rehash retrieval proxies / multiple transports / inconsistent source attack protection (aka "smart retrievals")#2241

Merged
aesedepece merged 34 commits intowitnet:masterfrom
aesedepece:feat/smart-retrievals
Oct 3, 2022
Merged

Rehash retrieval proxies / multiple transports / inconsistent source attack protection (aka "smart retrievals")#2241
aesedepece merged 34 commits intowitnet:masterfrom
aesedepece:feat/smart-retrievals

Conversation

@aesedepece
Copy link
Member

This is a rehash of #1523

Enables using proxies (HTTP, HTTPS, SOCKS4 and SOCKS5) when performing data retrieval. This allows retrieving data sources through different transports so as to ensure that the data sources are consistent and we are taking as small of a risk as
possible when committing to specially crafted data requests that may be potentially ill-intended.

Just add this to the [connections] section of witnet.toml to proxy your retrievals through 3 different open proxies:

retrieval_proxies = [
        "socks5://181.102.35.195:1080",
        "socks5://8.210.60.118:1080",
        "socks5://47.114.8.133:1080"
]

These addresses have been taken from this public list of proxies.

Close #1274
Close #1511

@aesedepece aesedepece marked this pull request as draft July 29, 2022 16:18
@aesedepece
Copy link
Member Author

This is still a draft because tests need yet to be fixed, and I want to improve some other parts.

@aesedepece aesedepece marked this pull request as ready for review August 1, 2022 09:36
Copy link
Contributor

@tmpolaczyk tmpolaczyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments.

pub struct RadManager;
pub struct RadManager {
/// Addresses of proxies to be used as extra transports when performing data retrieval.
proxies: Vec<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proxy addresses are only validated when executing a data request, so if the user inputs an invalid proxy they have no way to detect it until their node can resolve a data request.

So it would be nice to have either a way to validate the proxies on startup, or a JSON-RPC command to test the execution of a data request using proxies. If the old "try data request" command uses these proxies we can refer to it in the documentation as the expected way to ensure that the node is configured correctly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it would be nice to have either a way to validate the proxies on startup

Let me give this a further thought on how to approach it properly.

or a JSON-RPC command to test the execution of a data request using proxies. If the old "try data request" command uses these proxies we can refer to it in the documentation as the expected way to ensure that the node is configured correctly.

I think I prefer a more automated way 🤔

In any case, try_data_request is not currently using the proxies (as it implements its own retrieval logic that is not unified with that of RadManager). I guess it would be nice to have it use the proxies though, for it to be consistent with the actual witnessing behavior of the node and, as you pointed out, as a funny way to test the proxies functionality.

Copy link
Member Author

@aesedepece aesedepece Aug 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmpolaczyk which of these do you think is the right approach to performing validation of proxies upon startup?

  1. Simply validating the addresses
  2. Checking that we can connect to a socket at those addresses
  3. Somehow handshaking with the proxy (different for each supported proxy protocol)
  4. Running try_data_request and expecting a success

Specially in this last case, I think I would give the option of disabling the validation through configuration, so a downtime of the data sources doesn't prevent Witnet nodes from (re)starting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validating the addresses should be done to detect typos. Checking that we can connect or handshaking may not always work, so I prefer a more end to end test.

When running try_data_request from RadManager it is really hard to guess which proxy failed, the logs never mention the proxy url, so we need to run try_data_request once per proxy, and add a log that clearly says if a proxy failed (and stop the node in case of failure?). Creating a test data request to google.com should work. I guess we can simply add a config variable like witnessing.http_test_url = "https://google.com", and allow false to disable that check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basic URL validation functionality is now in place:

$ RUST_LOG=witnet_rad=debug cargo run --release -- -c ~/.witnet/config/witnet.toml wallet server
Loading config from: witnet.toml
Setting log level to: DEBUG, source: Env
[2022-08-11T17:49:25Z INFO  witnet::cli] Sentry telemetry enabled
[2022-08-11T17:49:25Z DEBUG witnet_data_structures] Set environment to mainnet
[2022-08-11T17:49:25Z WARN  witnet_wallet] The local Witnet node JSON-RPC server is configured to listen at 0.0.0.0:21338 but the wallet will connect to 127.0.0.1:21338
[2022-08-11T17:49:26Z INFO  witnet_config::config] The default unproxied HTTP transport for retrieval is enabled.
[2022-08-11T17:49:26Z INFO  witnet_config::config] Configuring retrieval proxies: ["socks5://127.0.0.1", "whatever://192.111.135.17:18302", "laleche"]
[2022-08-11T17:49:26Z INFO  witnet_config::config] Paranoid retrieval percentage is set to 51%
Error: The following transport addresses are invalid:
- socks5://127.0.0.1 (the address is missing a port number)
- whatever://192.111.135.17:18302 ("whatever" is not a supported type of transport)
- laleche (relative URL without a base)

@aesedepece
Copy link
Member Author

Thanks @tmpolaczyk for all your suggestions. I'll try to address them all today.

@aesedepece aesedepece force-pushed the feat/smart-retrievals branch 3 times, most recently from 32b3307 to c0c65f8 Compare August 9, 2022 19:05
Comment on lines +557 to +558
log::warn!("Refraining not to commit to data request {} because the sources are apparently inconsistent", dr_pointer);
return Err(())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe in this case it would be better for the network safety to commit the RadError::InconsistentSource than to refrain from commiting.

In #2050 we discuss possible attacks that arise from data requests that can only be solved by a small fraction of all the nodes. I believe the same arguments apply to the inconsistent source attack, because nodes that enable this inconsistent source attack protection will not participate in that requests.

Imagine an attacker that creates a data request that returns a bad value for some nodes in order to penalize them, the classical inconsistent source attack. If some nodes refrain from participating in that request, then the attacker will control a higher percentage of the quorum, leading to the attacker gaining more reputation. This is true until a majority of nodes enable inconsistent source attack protection, then the request will be resolved with a RadError::InsufficientCommits.

On the other hand, if the eligible nodes commit an error, then all the nodes will participate in solving this request, so it will be equally hard for the attacker to create an eligibility proof for this request than for any other request. If the majority of nodes have enabled inconsistent source attack protection, it is possible that the data request is solved with a radon error as a result, and in that case nobody gains reputation. And in the case where only a minority have enabled inconsistent source attack protection, there will be a few error commits in the data request, so the attack should still work, with a slightly lower profit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... let me give that a thought. This is not a trivial point because what you are suggesting actually requires a protocol change to assign an identifier for InconsistentSource in RadonErrors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course I had already created such identifier, but this was a non-consensus-breaking change only because we are never putting that error into a commitment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, in that case we could commit a RadError::Unknown, and in a future pull request add the logic to enable a new RadonError using TAPI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have solved it like this:

.map_err(
    // The `InconsistentSource` error is mapped here for the sake of backwards
    // compatibility. Namely, to enable paranoid retrieval without having to
    // immediately introduce a breaking change that may jeopardize oracle
    // queries. The point of making the mapping here is to only affect actual
    // witnessing and committing, but not the `try_data_request` function, which
    // can rather use the `InconsistentSource` error for clarity.
    // TODO: pursue a WIP that introduces `InconsistentSource` as a proper
    //  RadonError at the protocol level.
    |err| match err {
        RadError::InconsistentSource => RadError::Unknown,
        other => other,
    },
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. As nodes are already refraining from committing in such cases... am I right to assume that we can change this part and start committing RadonError::Unknown in this "everything went wrong" case without breaking consensus?

In that case, we would start to see reveals with RadonError::Unknown, which currently are rare if not non-existing.

The thing is that the lack of commits caused by the code block you are pointing out is probably not even having an impact on oracle queries as of today, because those missing commits should be allowing other valid commits to take their place, under the assumption that there are always more eligible identities than necessary.

If that's the case, and we start to default to committing an error, that would most times lead to worse tallies than not doing so, because miners can't tell errors from values while at the commitment stage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I right to assume that we can change this part and start committing RadonError::Unknown in this "everything went wrong" case without breaking consensus?

Should be safe, because nodes can always commit an error. So the result would be the same as a sudden increase in http errors or timeout errors.

If that's the case, and we start to default to committing an error, that would most times lead to worse tallies than not doing so, because miners can't tell errors from values while at the commitment stage.

True, then I guess the right approach would be to keep the old behavior for now, and investigate about the consequences of changing it in the future. Would the ability to distinguish values from errors in the commit stage be useful? Or would it introduce even more attack vectors?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the ability to distinguish values from errors in the commit stage be useful? Or would it introduce even more attack vectors?

Idk, leaking that info at the commitment stage makes it to easy in theory for lazy witnesses not to perform the retrieval and simply copy the error from other commitments once they have seen rf * consensus_requirement of those (because at that point they will know in advance that the only value they can commit without being slashed must be an error).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the other hand, that would allow to skip the reveal phase 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about adding a flag to the commit like is_error: bool, this way lazy witnesses can only guess the error. And I believe if your error is different from the consensus error you miss the reward, or something like that, no sure because that logic is complex. That would allow miners to choose non-error commits, but it will also allow miners to choose error commits to sabotage a data request.

@tmpolaczyk
Copy link
Contributor

I just found a very funny issue, this PR breaks RNG requests because the nodes commit a "ModeTie" error instead of a random value.

@tmpolaczyk
Copy link
Contributor

When trying to use a public socks5 proxy, I get this error when trying a request with https sources (didn't test http):

[2022-08-11T11:36:19Z WARN  isahc::handler] request completed with error: the server certificate could not be validated

witnet.toml:

proxies = [
        "socks5://72.217.216.239:4145",
]

I tried other proxies and I got either the same error or a "retrive timeout" error, so not sure if I had bad luck choosing the proxies or it doesn't work as expected for some reason.

@aesedepece
Copy link
Member Author

I tried other proxies and I got either the same error or a "retrive timeout" error, so not sure if I had bad luck choosing the proxies or it doesn't work as expected for some reason.

I tried a few proxy addresses as well and found out that most of those that you can find out there are garbage, regardless of being SOCKS or HTTP.

The only one that worked like a charm was the SOCKS5 interface on port 9150 from my own Tor Browser. This added like 1-2s of latency but still all the retrievals were successful (except one API that gave me 403 because they're behind a WAF or something).

@tmpolaczyk
Copy link
Contributor

I tried a few proxy addresses as well and found out that most of those that you can find out there are garbage, regardless of being SOCKS or HTTP.

That seems to be true. I finally got it working by running a local socks server.

@aesedepece aesedepece force-pushed the feat/smart-retrievals branch from 7d65860 to 1a2168d Compare August 12, 2022 10:19
Copy link
Contributor

@tmpolaczyk tmpolaczyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments for the WitnetHttpClient.

@aesedepece aesedepece added this to the 1.5.3 milestone Aug 22, 2022
@aesedepece aesedepece self-assigned this Aug 22, 2022
@aesedepece aesedepece force-pushed the feat/smart-retrievals branch 2 times, most recently from 7fa65f4 to a3c7ee6 Compare August 24, 2022 11:33
@aesedepece aesedepece force-pushed the feat/smart-retrievals branch from a3c7ee6 to c59a11b Compare August 30, 2022 09:20
Comment on lines 25 to 29
// This is dropping RadManager explicitly so as to get rid of the panic of doing so.
std::panic::catch_unwind(|| {
std::mem::drop(manager);
})
.ok();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this shouldn't be needed, I believe there is a bug in the stop_system_if_panicking function, which panics if the actor is being dropped in a panicking thread while no actix system is running. When testing other actors they always run inside an actix system so that's not a problem, but the RadManager is more standalone. So I believe that running inside an actix system should be optional, to avoid doing this workaround in other code that doesn't use actix. I'll see if it can be fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, #2276

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this hack can be removed now?

@tmpolaczyk
Copy link
Contributor

Proxy URLs are validated at startup, but there is no connection attempt to ensure that they work.

There is no easy way to manually check that proxies work:

  • The witnet toolkit binary cannot be used to test proxies because it doesn't read the witnet.toml file.
  • The witnet node sendRequest command reads the witnet.toml file but ignores the proxy options.
  • Sheikah connected to the wallet reads the witnet.toml and uses the witnessing config, but in case of timeout there is some kind of bug and sheikah displays the loading icon instead of the error. After waiting more than one minute it finally displays the error "Source looks inconsistent when...", but that error also doesn't mention which proxy failed.

So unless I missed something there is no way to check if the configured proxies work, it would be nice to have something like a test-request CLI command.

In the node, when one proxy fails with a timeout error, this is the only log:

[2022-09-30T08:27:10Z ERROR witnet_node::actors::chain_manager::mining] Couldn't resolve rad request 86920ddf5d7066b1db629066ab50118855c05bf7fbf3d4ba53e1154c3ca02b43: Timeout during retrieval phase

It would be nice if the error message could mention the URL of the proxy that failed.

@aesedepece
Copy link
Member Author

Proxy URLs are validated at startup, but there is no connection attempt to ensure that they work.

In my mind I had implemented full validation (by querying some well-known URL). But it looks like that happened only in my mind.
Would you be OK if we go with this as it is and implement that or a CLI method (or both) in a further PR?

@tmpolaczyk
Copy link
Contributor

Would you be OK if we go with this as it is and implement that or a CLI method (or both) in a further PR?

Yes, it could be a separate PR. I suggested it here in case that makes us rethink the configuration parameters or something, but the basic structure of the witnessing params looks solid enough.

Then if I haven't missed any breaking changes this is ready to merge.

Made possible through reexporting `surf` was `witnet_net::surf`.

As suggested by @tmpolaczyk
Also:
- Fix bug about detecting liars
- Add tests for `evaluate_paranoid_retrieval`
- Improve comments and documentation
- Fix usage of tally logic instead of aggregation logic when acting paranoid

As suggested by @tmpolaczyk. Thanks!
…mitting

as suggested by @tmpolaczyk

TODO: pursue a WIP that introduces `InconsistentSource` as a proper
 RadonError at the protocol level.
This is only validating that they are well-formed URLs and they
are using a supported protocol (http, https, socks4 or socks5)
So as to prevent parsing and validating them every time they get
used.

As suggested by @tmpolaczyk
@aesedepece aesedepece force-pushed the feat/smart-retrievals branch from 6c5fabb to 6a443bb Compare October 3, 2022 15:04
@aesedepece aesedepece merged commit 6a443bb into witnet:master Oct 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Discussion: How to avoid "malicious" requests? (Inconsistent source attack) Assesment: Smart retrievals for preventing Inconsistence source attack

2 participants