Skip to content

Design a solution for caching downloads of $refs in order to improve performance in cases with many remote refs #452

@sirosen

Description

@sirosen

Original use-case sourced from this PR: #451

The current caching capability significantly improves runtimes for remote schemas when there is a single remote file to download, but does nothing to improve the case where there are refs to resolve. Refs are cached in-memory by referencing, but discarded between runs.

For faster runs, check-jsonschema should cache resolved refs on disk as well.

Some basic requirements:

  • this must respect the --no-cache setting
    • probably the same object which is used for fetching remote schemas should be passed to the ref resolver
  • filenames must be chosen such that there are no conflicts between different schemas (users won't be able to control filenames)
  • if the new file-and-dir layout for these data conflicts with the existing cache dir layout, that needs resolution
    • ideal: design a strategy to migrate cache data for the next 1-2 calendar years
    • acceptable: ignore old cache data, provide a changelog note on how to clean it up
  • the behavior here need to be tested

Note

A friend of mine suggested putting cache data into a DB (e.g. sqlite) when we talked about this, so that it could be annotated with richer metadata and structure. Although that might be a good idea longer term, I don't want to reach for that quite yet -- I think this can be solved with a good dir structure for now.

Here's one initial idea, for evaluation:

  • each $ref is canonically named {md5 of the absolute URI}.json
  • in the ~/.cache/check_jsonschema/ dir, add a dir named refs/ (the schemas are in a dir named downloads/, which now seems like a suboptimal name but will suffice)
  • ref resolution stores resolved refs in the refs/ dir

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions