Skip to content

Use bloom filter for quicker lookup #21

@patrickdemooij9

Description

@patrickdemooij9

Currently we do a text search to see if the URL exists in our table. Most of the time, this is not needed as the URL isn't listed in the database. We should use a bloom filter (https://benwendt.ca/articles/a-bloom-filter-in-c/) to determine if an URL could be in the database and then do the SQL request. This way we won't waste processing power/memory on the requests that we shouldn't do anything with.

Things to take into account:

  • What to do with deletions. Do we just pull all redirects and recreate our bloom filter? Or do we use an more extensive bloom filter that records the amount of bits?
  • Same for any changing of the Urls.
  • When do we initialize the bloom filter and is it really worth it in place of SQL? (check memory)
  • How does this work with regex lookups? I assume we cannot do that with the bloom filter, so we should check if we can even use bloom filter if we are still doing a lookup for regex every time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions