Skip to content

⚡️(api) improve performance of HTTP basic auth#337

Merged
wilbrdt merged 2 commits intomasterfrom
improve-auth-performance
Jun 7, 2023
Merged

⚡️(api) improve performance of HTTP basic auth#337
wilbrdt merged 2 commits intomasterfrom
improve-auth-performance

Conversation

@wilbrdt
Copy link
Contributor

@wilbrdt wilbrdt commented May 23, 2023

Purpose

HTTP Basic auth implementation in Ralph is using the secure and standard bcrypt algorithm to hash/salt passwords before storing them.
But this implementation comes with a performance cost, as each request has an overhead of around 200ms.

Proposal

By LRU caching the HTTP auth credentials, we should greatly decrease the overhead of the vast majority of requests.

  • add LRU caching
  • test with a timing middleware on endpoint /whoami
INFO:     Application startup complete.
INFO:ralph.api:TIMING: Wall:  209.6ms | CPU:  209.8ms | app.ralph.api.whoami (HTTP Basic Auth overhead)
INFO:ralph.api:TIMING: Wall:  210.0ms | CPU:  210.2ms | app.ralph.api.whoami
INFO:     172.26.0.1:35240 - "GET /whoami HTTP/1.1" 200 OK
INFO:ralph.api:TIMING: Wall:    0.5ms | CPU:    0.5ms | app.ralph.api.whoami (HTTP Basic Auth overhead)
INFO:ralph.api:TIMING: Wall:    0.7ms | CPU:    0.7ms | app.ralph.api.whoami
INFO:     172.26.0.1:43642 - "GET /whoami HTTP/1.1" 200 OK
INFO:ralph.api:TIMING: Wall:    0.4ms | CPU:    0.4ms | app.ralph.api.whoami (HTTP Basic Auth overhead)
INFO:ralph.api:TIMING: Wall:    0.5ms | CPU:    0.6ms | app.ralph.api.whoami
INFO:     172.26.0.1:43644 - "GET /whoami HTTP/1.1" 200 OK
INFO:ralph.api:TIMING: Wall:    0.5ms | CPU:    0.4ms | app.ralph.api.whoami (HTTP Basic Auth overhead)
INFO:ralph.api:TIMING: Wall:    0.7ms | CPU:    0.6ms | app.ralph.api.whoami
INFO:     172.26.0.1:43658 - "GET /whoami HTTP/1.1" 200 OK
  • test on near-real conditions with LBT
    auth_caching

@wilbrdt wilbrdt self-assigned this May 23, 2023
@wilbrdt wilbrdt force-pushed the improve-auth-performance branch 2 times, most recently from b51674c to 94deb57 Compare May 23, 2023 16:55
@wilbrdt wilbrdt linked an issue May 23, 2023 that may be closed by this pull request
@wilbrdt wilbrdt force-pushed the improve-auth-performance branch from 94deb57 to 2d8ac37 Compare May 23, 2023 16:57
Copy link
Contributor

@jmaupetit jmaupetit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's wonderful and promising, wow!

Maybe you should also add python tests ensuring we hit the cache on subsequent requests?

Copy link
Contributor

@Leobouloc Leobouloc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good to me (I agree with comments by @jmaupetit concerning the default values). One suggestion:

Maybe only cache results with valid credentials to avoid the following situation:

a user (probably an admin in this situation) tries to connect with his credentials, then realizes they are not valid and proceeds to create an account with these credentials. The user would be locked out of the account for the next hour (probably without knowing why) as the credentials were cached as invalid.

This would not lead to a timing exploit or decrease performance for users.

@wilbrdt
Copy link
Contributor Author

wilbrdt commented May 30, 2023

Maybe only cache results with valid credentials to avoid the following situation:

a user (probably an admin in this situation) tries to connect with his credentials, then realizes they are not valid and proceeds to create an account with these credentials. The user would be locked out of the account for the next hour (probably without knowing why) as the credentials were cached as invalid.

This would not lead to a timing exploit or decrease performance for users.

For now, we are caching what's in the auth.json file, so adding/removing a user is only taken into account after a server reboot.
Moreover, if someone tries to connect with invalid credentials, an exception is raised and nothing is returned, so I don't think the invalid credentials are cached (I might be wrong though).

@wilbrdt
Copy link
Contributor Author

wilbrdt commented May 30, 2023

Maybe you should also add python tests ensuring we hit the cache on subsequent requests?

We would either need to do a timing test, or to activate with info=True instrumentation that gives us access to cache_info(), but that would induce a performance penalty for each requests.

@wilbrdt wilbrdt force-pushed the improve-auth-performance branch from 2d8ac37 to 67d9979 Compare May 30, 2023 15:25
@SergioSim
Copy link
Collaborator

SergioSim commented May 30, 2023

We would either need to do a timing test, or to activate with info=True instrumentation that gives us access to cache_info(), but that would induce a performance penalty for each requests.

Maybe we could directly call the authenticated_user method in a unit test (without starting the server) to check that a new entry is added to its cache property?

@Leobouloc
Copy link
Contributor

Maybe only cache results with valid credentials to avoid the following situation:

a user (probably an admin in this situation) tries to connect with his credentials, then realizes they are not valid and proceeds to create an account with these credentials. The user would be locked out of the account for the next hour (probably without knowing why) as the credentials were cached as invalid.

This would not lead to a timing exploit or decrease performance for users.

For now, we are caching what's in the auth.json file, so adding/removing a user is only taken into account after a server reboot.

Good point concerning get_stored_credentials being cached. Is this necessary ? As long as we are using a file to store Basic Auth information, I assume we are dealing with a low number of users, so maybe we could do without, this would allow to create new users without rebooting the server.

Moreover, if someone tries to connect with invalid credentials, an exception is raised and nothing is returned, so I don't think the invalid credentials are cached (I might be wrong though).

My bad, I thought cachetools was also caching Exceptions 🙃

@wilbrdt
Copy link
Contributor Author

wilbrdt commented Jun 5, 2023

Good point concerning get_stored_credentials being cached. Is this necessary ? As long as we are using a file to store Basic Auth information, I assume we are dealing with a low number of users, so maybe we could do without, this would allow to create new users without rebooting the server.

@Leobouloc We could probably remove get_stored_credentials from being cached so we can cache users without rebooting indeed.
It would be better to do that in another PR, as we would need to check if opening/closing the file for each request is fast enough, or if we would need to cache the file based on its last modified time?

@wilbrdt wilbrdt force-pushed the improve-auth-performance branch 2 times, most recently from 65f9ad6 to 5e7ef64 Compare June 5, 2023 16:22
@wilbrdt wilbrdt requested review from Leobouloc and jmaupetit June 6, 2023 10:10
Copy link
Collaborator

@SergioSim SergioSim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome! 👍
image

@Leobouloc
Copy link
Contributor

Leobouloc commented Jun 7, 2023

Good point concerning get_stored_credentials being cached. Is this necessary ? As long as we are using a file to store Basic Auth information, I assume we are dealing with a low number of users, so maybe we could do without, this would allow to create new users without rebooting the server.

@Leobouloc We could probably remove get_stored_credentials from being cached so we can cache users without rebooting indeed. It would be better to do that in another PR, as we would need to check if opening/closing the file for each request is fast enough, or if we would need to cache the file based on its last modified time?

My guess is that as long, as we are using a single file (auth.json) to store user credentials, we are dealing with a limited amount of users, for which credentials would all fit in the cache. This means reading the file would only be done in the following cases: 1/ a new user was added 2/ a user provided wrong credentials . Both cases would be pretty infrequent, and therefore would not affect performance in a significant manner.

Copy link
Contributor

@Leobouloc Leobouloc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good to me. I left a few small suggestions.

@wilbrdt wilbrdt force-pushed the improve-auth-performance branch from 5e7ef64 to a9407a5 Compare June 7, 2023 13:30
wilbrdt added 2 commits June 7, 2023 16:32
HTTP Basic auth implementation in Ralph is using the secure and standard
bcrypt algorithm to hash/salt passwords before storing them.
But this implementation comes with a performance cost, as each request has an
overhead of almost 200ms.
By LRU caching the HTTP auth credentials, we should greatly decrease the
overhead of the vast majority of the requests.
Gitlint logs a warning when using a rule that uses a custom regex and
`regex-style-search` option is not enabled. Now enabling this option to avoid
the warning
@wilbrdt wilbrdt force-pushed the improve-auth-performance branch from a9407a5 to 454ecfd Compare June 7, 2023 14:33
@wilbrdt wilbrdt merged commit 5d7d632 into master Jun 7, 2023
@wilbrdt wilbrdt deleted the improve-auth-performance branch June 7, 2023 16:53
piptouque pushed a commit to piptouque/ralph that referenced this pull request Jan 16, 2026
Changes :
- Cache the credentials file based on
when it was last modified.
- Also cache the users' decrypted data
based on that information.

Rationale :
Right now, modifying the basic auth
credentials file (adding a user, for instance)
requires a server restart.
This is because of caching at
the credentials file level.
Following the dicussion on [a previous PR](openfun#337 (comment))
we propose to  add the last modified time
of the credentials file to
the chache key for the
credentials file and the
users' data both.
If caching of the credentials file
seems unnecessary
(as was discussed in the above PR),
 it may be removed without
 breaking this feature.
piptouque pushed a commit to piptouque/ralph that referenced this pull request Jan 16, 2026
Changes:
- Cache the credentials file based on when it was last modified.
- Also cache the users' decrypted data based on that information.

Rationale:
Right now, modifying the basic auth
credentials file (adding a user, for instance)
requires a server restart.
This is because of caching at the credentials file level.
Following the dicussion on [a previous PR]
(openfun#337 (comment))
we propose to  add the last modified time
of the credentials file to
the cache key for the credentials file and the users' data both.
If caching of the credentials file seems unnecessary
(as was discussed in the above PR),
it may be removed without breaking this feature.
piptouque pushed a commit to piptouque/ralph that referenced this pull request Feb 3, 2026
Changes:
- Cache the credentials file based on when it was last modified.
- Also cache the users' decrypted data based on that information.

Rationale:
Right now, modifying the basic auth
credentials file (adding a user, for instance)
requires a server restart.
This is because of caching at the credentials file level.
Following the dicussion on [a previous PR]
(openfun#337 (comment))
we propose to  add the last modified time
of the credentials file to
the cache key for the credentials file and the users' data both.
If caching of the credentials file seems unnecessary
(as was discussed in the above PR),
it may be removed without breaking this feature.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Poor performance of LRS due to HTTP Basic Auth hashing

4 participants