Skip to content

fix: nonce implementation#1754

Merged
ifrit98 merged 3 commits intoopentensor:stagingfrom
0ximjosh:fix/nonce
Mar 25, 2024
Merged

fix: nonce implementation#1754
ifrit98 merged 3 commits intoopentensor:stagingfrom
0ximjosh:fix/nonce

Conversation

@0ximjosh
Copy link
Contributor

@0ximjosh 0ximjosh commented Mar 21, 2024

name: Bug Fix Contribution
description: Use this template when contributing a bug fix.
labels: [bug, pull request]

Fix nonce implementation to guard against replay attacks

Nonce is shorthand for Number only used once. We use this to protect
against replay attacks. Since receivers in the bittensor network are
decentralized, requiring domain names for every receiver would be
self-defeating. This forces us to depend on HTTP communication which is prone to
many more attacks than HTTPS. One of these problems is called a replay attack.
This is when a malicious agent intercepts a message being sent to a miner and
sends it again, "replaying" the request.

A nonce is only used once, so sending another request with the same nonce is
required to fail. To accomplish this the server holds a dictionary of sender
identifiers -> last nonce, and makes sure the next nonce is less than the
previous.

# bittensor/axon.py
endpoint_key = f"{synapse.dendrite.hotkey}:{synapse.dendrite.uuid}"

# Check the nonce from the endpoint key.
if (
    endpoint_key in self.nonces.keys()
    and self.nonces[endpoint_key] is not None
    and synapse.dendrite.nonce is not None
    and synapse.dendrite.nonce <= self.nonces[endpoint_key]
):
    raise Exception("Nonce is too small")

This problem here is that nonce's are held in memory. If the server restarts
then there is no nonce held in memory and therefore a duplicate request can be
freely sent by a malicious user.

To solve this, receivers should both keep the last nonce in memory
and require nonces to be UNIX timestamps with a pre-determined delta to the
current time. A delta of 4 seconds was chosen since miners generally take a few
seconds to restart & requests should be able to reach an axon sent from a dendrite
and start the verification process of the request within 4 seconds including network
latency. This way if an attacker attempts to replay a message after the
receiver re-starts the replayed nonce time stamp will be too far behind the
delta to the current time and be rendered invalid.

Example Flow

Delta = 4
Request comes at timestamp 10
Received at timestamp 11
  |  Nonce > now - delta
  |  10    > 11  - 4
  |  Passes delta check. Nonce is within delta
Container restarts
Malicious user sends duplicate request at timestamp 15
  |  Nonce < now - delta
  |  10    < 15  - 4
  |  Fails delta check, nonce is too old

Testing

There are 5 routes that need tested before this PR is merged.

  • First good Request
  • Second good Request
  • Second Request replayed
  • Second Request replayed after axon restarts
  • Third Request missing nonce entirely

@ifrit98 ifrit98 self-requested a review March 25, 2024 19:58
Copy link
Contributor

@ifrit98 ifrit98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@ifrit98 ifrit98 merged commit de0bd31 into opentensor:staging Mar 25, 2024
@ifrit98 ifrit98 mentioned this pull request Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants