Skip to content

Feature/configurable database#71

Merged
PGijsbers merged 7 commits intomainfrom
feature/configurable_database
Oct 18, 2023
Merged

Feature/configurable database#71
PGijsbers merged 7 commits intomainfrom
feature/configurable_database

Conversation

@PGijsbers
Copy link
Contributor

This PR makes the database connection configurable through a configuration file (currently located at src/config.toml). The database credentials must be set through environment variables instead (which may be in a .env file). I am not happy with this implementation, but figured I would share for input.

We need our connections to be configurable as this will make it easier to set up in different environments (e.g., different developers, or staging vs production). For a lot of these values, we can provide sensible defaults which will work with the default development environment. Providing these defaults through an example configuration file makes it immediately easy to understand what is configurable.

It would then also be convenient to provide an easy way to override the configuration of the shipped configuration, which could be either through environment variables, or a separate configuration file (for example, by default located in something like ~/.config/openml-server or a file explicitly passed as command line argument). This is currently not supported.

I also excluded credentials from this file, since I do not want people to accidentally share credentials. This is why I put them in environment variables, but we could also work with a separate credentials file. In practice this would be the same (when working with .env files), so I don't really think this matters.

For the database connections we would expect information to be duplicate (e.g., where it server is located). However, toml does not support defaults natively, which is why I added a defaults subtable which propagates its values to sibling tables. I doubt that the added complexity is warranted (at this point), but I wanted to make it easier for myself to switch between the my local test server snapshot and the production server snapshot (that will be hosted in k8s).

I also think it might be useful to parse the configuration into some Pydantic class in the future to catch invalid configurations earlier and, for non-database configurations which may be used more frequently in the program, to access configurations through typed objects rather than dictionaries to aid IDE features.

Copy link
Member

@josvandervelde josvandervelde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, nice implementation!

Just some thoughts: if we'll ever want to use docker-compose, we probably want to be able to read the credentials and some configured variables inside the docker-compose.yaml (e.g. hostname and credentials of db). To be able to access to such variables, they should, as far as I know, be environment variables.

Currently, both the default-credentials and the other variables wouldn't be accessible inside docker-compose.yaml. With that in mind, would it be better to have different .env files, instead of the config.toml and the default values specified in config.py? We could something in the line of:

  • default.env
  • default-credentials.env
  • [profile].env
  • [profile]-credentials.env
    Where the profile could be a cmdline arg?

An additional reason to do this, would be to have similar mechanisms for all config-definition. They'll all be .env files, instead of some info in .env, some in .toml, some defaults in .py. A con would be that we'd have to resort to env vars without any hierarchy (as we do have in .toml).

Thoughts?

@PGijsbers
Copy link
Contributor Author

Discussed this on the talk but I had written some stuff down already, so for posterity:

It seems that the correct way to mount secrets into docker-compose is through use of the secrets mechanism. This writes contents of a file on the host to a temporary /run/secrets/ directory. Security-wise, there doesn't seem to be a big difference other than not accidentally exposing credentials if someone dumps all environment variables.

You could specify a .toml file as a secret. However, I don't think you can use the content of the .toml secret within the compose.yaml definitions, whereas you could with environment variables or single-line file secrets (single-line files seem to be supported as a way to provide credentials to e.g., a MySQL server).

So it seems that .env has the advantage of being able to be read in a docker compose step, whereas .toml makes for more readable configurations.

I don't see a problem with using two distinct mechanisms (one for credentials and one for all other configuration), but it's not immediately clear to me which configuration file we should prefer (.env or .toml).

@PGijsbers
Copy link
Contributor Author

I am going to go ahead and merge this PR, so we have something to work with. We will revisit this when get to integrating services through docker compose (soon).

@PGijsbers PGijsbers merged commit 8219ec8 into main Oct 18, 2023
@PGijsbers PGijsbers deleted the feature/configurable_database branch October 18, 2023 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants