Skip to content

fix Synapse base performance (more than 10x speed up)#2161

Merged
thealligatorking merged 1 commit intoopentensor:stagingfrom
backend-developers-ltd:fix_to_headers_speed
Aug 8, 2024
Merged

fix Synapse base performance (more than 10x speed up)#2161
thealligatorking merged 1 commit intoopentensor:stagingfrom
backend-developers-ltd:fix_to_headers_speed

Conversation

@mjurbanski-reef
Copy link
Contributor

@mjurbanski-reef mjurbanski-reef commented Jul 23, 2024

Bug

bittensor Synapse.to_headers() introduces 100ms+ delay on both server and client side. i.e. each bittensor request RTT is overall delayed by 300ms more than it should be

Description of the Change

While this was a wrong place (inside the loop) to "regenerate" schema, somehow this line was "okay" in pydantic v1 but in v2 it is slower 1640b0d#diff-2a49a90fb91fba6743fa9a754231a89e98730b05e4ff55db2fa532e1915c126cR635

The problem with this is that some subnets do award the fastest miner etc, so adapting bittensor v7 would be deteremental for them.

The fixed code in this PR vs previous version result in 10x speedup for simple "Synapse" ping. The difference will likely be greater the more complex "Synapse" object is used.

Nice thing about it is that after releasing this change it is in miners best interests to update due to the same reward scoring reason mentioned already. I'm pretty sure after this it will be noticably faster than v6, but did not bother to measure it as there as everyone needs to update anyhow.

Number of queries: 10000
# Current master:
Average: 0.0439 seconds
Standard deviation: 0.0195 seconds
# After changes:
Average: 0.0027 seconds
Standard deviation: 0.0012 seconds
# After changes, but without cache
Average: 0.0083 seconds
Standard deviation: 0.0026 seconds

That is more than 15x improvement.
Realistically, it is best to think about it as "150ms" improvement per side.

Please note this code affects both client (dendrite) and server (axon) side, so if only one of them is updated the improvement will be half of that. This information is important to miners who want to know what is the difference between them using it or not (as they cannot influence the other side).

Alternate Designs

Not using cache, but that was more than 3x slower (i.e. 25ms per side)

Possible Drawbacks

None.

Verification Process

Unit tests already cover these things + performance testing using following script:

import asyncio
import statistics
import bittensor


async def measure_ping_performance(axon_info, num_queries=10000, *, wallet):
    dendrite = bittensor.dendrite(wallet=wallet)
    ping_times = []
    for i in range(num_queries):
        response = await dendrite.call(
            target_axon=axon_info,
            synapse=bittensor.Synapse(),
            timeout=5,
        )
        ping_time = response.dendrite.process_time
        ping_times.append(ping_time)

        print(f"Query#{i} response: {response!r}, Time taken: {ping_time:.4f} seconds")

    average_ping_time = sum(ping_times) / num_queries
    std_dev_ping_time = statistics.stdev(ping_times)

    print(f"Number of queries: {num_queries}")
    print(f"Average: {average_ping_time:.4f} seconds")
    print(f"Standard deviation: {std_dev_ping_time:.4f} seconds")


async def main():
    ip = "127.0.100.1"
    port = 8091
    wallet = bittensor.wallet()
    axon = bittensor.axon(
        ip=ip,
        external_ip=ip,
        port=port,
        wallet=wallet,
    )
    axon.start()
    await measure_ping_performance(axon.info(), wallet=wallet)


if __name__ == "__main__":
    asyncio.run(main())

Release Notes

  • simple "Synapse" call performance improved 10x

@mjurbanski-reef mjurbanski-reef force-pushed the fix_to_headers_speed branch 2 times, most recently from 24fc51b to 96224f6 Compare July 23, 2024 18:16
@mjurbanski-reef mjurbanski-reef changed the title fix Synapse base performance fix Synapse base performance (more than 10x speed up) Jul 24, 2024
@ibraheem-abe ibraheem-abe requested a review from a team July 24, 2024 16:28
Copy link
Contributor

@thewhaleking thewhaleking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but I want to test it a bit.

@mjurbanski-reef
Copy link
Contributor Author

@thewhaleking do you think I should add some additional tests cases to test suite?

@thewhaleking
Copy link
Contributor

@thewhaleking do you think I should add some additional tests cases to test suite?

Additional test cases are always welcomed. There was some chatter about cache invalidation, but I don't see how that would apply here. I could be missing something, however.

@mjurbanski-reef
Copy link
Contributor Author

AFAIK there is no cache invalidation issue here and a use case where it would be needed has not been presented so far.

As for the additional test cases - it is my belief that the one already attached should suffice, but you mentioned some additional testing, so I wondered if I missed something.

@thewhaleking
Copy link
Contributor

As for the additional test cases - it is my belief that the one already attached should suffice, but you mentioned some additional testing, so I wondered if I missed something.

I just meant testing as in running locally and seeing if there was any way I could make it wrong. I have not encountered a situation in which this occurs. I would like to get this merged in early this week.

@mjurbanski-reef
Copy link
Contributor Author

2 weeks passed since PR creation

Anything I can do to help this move along?

thewhaleking
thewhaleking previously approved these changes Aug 6, 2024
@ibraheem-abe
Copy link
Contributor

@mjurbanski-reef Thanks for the contribution! Can you please rebase this off of staging so we can merge it?

@mjurbanski-reef mjurbanski-reef changed the base branch from master to staging August 8, 2024 15:16
@mjurbanski-reef mjurbanski-reef dismissed thewhaleking’s stale review August 8, 2024 15:16

The base branch was changed.

@mjurbanski-reef
Copy link
Contributor Author

@ibraheem-opentensor rebased . Only failing test is the same as on staging now https://github.com/opentensor/bittensor/actions/runs/10300011141/job/28508536572 .

@thealligatorking thealligatorking merged commit 8c6909c into opentensor:staging Aug 8, 2024
@ibraheem-abe ibraheem-abe mentioned this pull request Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants