Skip to content
This repository was archived by the owner on Jul 11, 2025. It is now read-only.

Add cache warmup#10

Merged
ConnectedSystems merged 7 commits intomainfrom
cache-warmup
Sep 23, 2024
Merged

Add cache warmup#10
ConnectedSystems merged 7 commits intomainfrom
cache-warmup

Conversation

@ConnectedSystems
Copy link
Copy Markdown
Collaborator

Please note this PR modifies the expected config settings, notably renaming CACHE_DIR to something more appropriate:

[prepped_data]
PREPPED_DATA_DIR = "C:/some_path_to_data/MPA/"

[server_config]
TIFF_CACHE_DIR = "<some location to cache geotiffs>"  # Previously, this was `CACHE_DIR` which is not very informative
REGIONAL_CACHE_DIR = "<some location to cache regional datasets>"
DEBUG_MODE = "false"  # Optional, disables file caching and displays debug logs
COG_THREADS = "2"  # Optional, Number of threads to use when creating COGs (defaults to 1)
TILE_SIZE = "256"  # Optional, tile block size to use (defaults to 256)

With JLD2 (HDF5-based Julia data store)

using ReefGuideAPI

@time ReefGuideAPI.warmup_cache(".config.toml")
# 260.921239 seconds (1.17 G allocations: 66.052 GiB, 9.45% gc time, 13.53% compilation time: 6% of which was recompilation)

# Restart

@time ReefGuideAPI.warmup_cache(".config.toml")
# 215.732578 seconds (1.27 G allocations: 50.469 GiB, 53.58% gc time, 1.33% compilation time: 47% of which was recompilation)

Using built-in Serialization/Deserialization

using ReefGuideAPI

@time ReefGuideAPI.warmup_cache(".config.toml")
# 132.035940 seconds (266.51 M allocations: 18.211 GiB, 11.11% gc time, 31.49% compilation time: 2% of which was recompilation)

# After restart
# This should avoid warming up cache completely. 
# If a new cache is desired, delete the old file.

@time ReefGuideAPI.warmup_cache(".config.toml")
# 83.431594 seconds (155.51 M allocations: 8.528 GiB, 7.52% gc time, 0.61% compilation time)

Obviously, I went with the direct Serialization/Deserialization method. Size of disk is much smaller too (2.7 GB vs 7.8 GB, though I'm not sure if any compression was applied with JLD2)

# After restart

ReefGuideAPI.start_server(".config.toml")
# This takes ~85 seconds, as indicated above.

A further alternative is to create the cache at compile time, but requires more digging.

@PeterBaker0 @arlowhite could one of you test please?

I recently updated to Windows 11 and by default Julia is now blocked by the firewall (needs sys admin to change configs) so I can't test the REST API at the moment.

@ConnectedSystems ConnectedSystems linked an issue Sep 22, 2024 that may be closed by this pull request
Copy link
Copy Markdown
Collaborator

@PeterBaker0 PeterBaker0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @arlowhite mentioned we can't embed this into docker layer since it depends on runtime data processing of mounted data, however it will still, in the context of a shared filesystem (e.g. EFS), allow the work to only be done once, rather than every time a container restarts/starts.
My only recommendation would be to, at some stage, ensure there is a unique hash such as "input data + code version + ..." associated with the file cache so that we don't have to go cleanup stale data from the cache when processes change.

Thanks for doing this.

@ConnectedSystems
Copy link
Copy Markdown
Collaborator Author

My only recommendation would be to, at some stage, ensure there is a unique hash such as "input data + code version + ..." associated with the file cache so that we don't have to go cleanup stale data from the cache when processes change.

I'll put this in as a separate issue for later enhancement. Cheers @PeterBaker0

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cache to disk server cold start

2 participants