Add cache warmup by ConnectedSystems · Pull Request #10 · open-AIMS/ReefGuideAPI.jl

ConnectedSystems · 2024-09-22T02:54:37Z

Please note this PR modifies the expected config settings, notably renaming CACHE_DIR to something more appropriate:

[prepped_data]
PREPPED_DATA_DIR = "C:/some_path_to_data/MPA/"

[server_config]
TIFF_CACHE_DIR = "<some location to cache geotiffs>"  # Previously, this was `CACHE_DIR` which is not very informative
REGIONAL_CACHE_DIR = "<some location to cache regional datasets>"
DEBUG_MODE = "false"  # Optional, disables file caching and displays debug logs
COG_THREADS = "2"  # Optional, Number of threads to use when creating COGs (defaults to 1)
TILE_SIZE = "256"  # Optional, tile block size to use (defaults to 256)

With JLD2 (HDF5-based Julia data store)

using ReefGuideAPI

@time ReefGuideAPI.warmup_cache(".config.toml")
# 260.921239 seconds (1.17 G allocations: 66.052 GiB, 9.45% gc time, 13.53% compilation time: 6% of which was recompilation)

# Restart

@time ReefGuideAPI.warmup_cache(".config.toml")
# 215.732578 seconds (1.27 G allocations: 50.469 GiB, 53.58% gc time, 1.33% compilation time: 47% of which was recompilation)

Using built-in Serialization/Deserialization

using ReefGuideAPI

@time ReefGuideAPI.warmup_cache(".config.toml")
# 132.035940 seconds (266.51 M allocations: 18.211 GiB, 11.11% gc time, 31.49% compilation time: 2% of which was recompilation)

# After restart
# This should avoid warming up cache completely. 
# If a new cache is desired, delete the old file.

@time ReefGuideAPI.warmup_cache(".config.toml")
# 83.431594 seconds (155.51 M allocations: 8.528 GiB, 7.52% gc time, 0.61% compilation time)

Obviously, I went with the direct Serialization/Deserialization method. Size of disk is much smaller too (2.7 GB vs 7.8 GB, though I'm not sure if any compression was applied with JLD2)

# After restart

ReefGuideAPI.start_server(".config.toml")
# This takes ~85 seconds, as indicated above.

A further alternative is to create the cache at compile time, but requires more digging.

@PeterBaker0 @arlowhite could one of you test please?

I recently updated to Windows 11 and by default Julia is now blocked by the firewall (needs sys admin to change configs) so I can't test the REST API at the moment.

Remove use of bare `error()`

PeterBaker0

As @arlowhite mentioned we can't embed this into docker layer since it depends on runtime data processing of mounted data, however it will still, in the context of a shared filesystem (e.g. EFS), allow the work to only be done once, rather than every time a container restarts/starts.
My only recommendation would be to, at some stage, ensure there is a unique hash such as "input data + code version + ..." associated with the file cache so that we don't have to go cleanup stale data from the cache when processes change.

Thanks for doing this.

ConnectedSystems · 2024-09-23T07:31:41Z

My only recommendation would be to, at some stage, ensure there is a unique hash such as "input data + code version + ..." associated with the file cache so that we don't have to go cleanup stale data from the cache when processes change.

I'll put this in as a separate issue for later enhancement. Cheers @PeterBaker0

ConnectedSystems added 7 commits September 22, 2024 12:42

Update dependencies

22a70b3

Switch bare print statements to @info

54aa493

Indicate updated config

a42b2dc

Add docstrings

2163c2a

Add warmup function

cd74b3d

Serialize data to disk-based cache

64776db

Use more informative error message

175e285

Remove use of bare `error()`

ConnectedSystems requested review from PeterBaker0 and arlowhite September 22, 2024 02:54

ConnectedSystems linked an issue Sep 22, 2024 that may be closed by this pull request

Cache to disk server cold start #8

Closed

PeterBaker0 approved these changes Sep 22, 2024

View reviewed changes

ConnectedSystems mentioned this pull request Sep 22, 2024

Error when calculating total cover open-AIMS/ADRIA.jl#855

Closed

ConnectedSystems mentioned this pull request Sep 23, 2024

Store regional data cache by unique hash #12

Open

ConnectedSystems merged commit 028f194 into main Sep 23, 2024

ConnectedSystems deleted the cache-warmup branch September 23, 2024 07:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cache warmup#10

Add cache warmup#10
ConnectedSystems merged 7 commits intomainfrom
cache-warmup

ConnectedSystems commented Sep 22, 2024

Uh oh!

PeterBaker0 left a comment

Uh oh!

ConnectedSystems commented Sep 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ConnectedSystems commented Sep 22, 2024

Uh oh!

PeterBaker0 left a comment

Choose a reason for hiding this comment

Uh oh!

ConnectedSystems commented Sep 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants