Feature/configurable database by PGijsbers · Pull Request #71 · openml/server-api

PGijsbers · 2023-10-12T14:42:53Z

This PR makes the database connection configurable through a configuration file (currently located at src/config.toml). The database credentials must be set through environment variables instead (which may be in a .env file). I am not happy with this implementation, but figured I would share for input.

We need our connections to be configurable as this will make it easier to set up in different environments (e.g., different developers, or staging vs production). For a lot of these values, we can provide sensible defaults which will work with the default development environment. Providing these defaults through an example configuration file makes it immediately easy to understand what is configurable.

It would then also be convenient to provide an easy way to override the configuration of the shipped configuration, which could be either through environment variables, or a separate configuration file (for example, by default located in something like ~/.config/openml-server or a file explicitly passed as command line argument). This is currently not supported.

I also excluded credentials from this file, since I do not want people to accidentally share credentials. This is why I put them in environment variables, but we could also work with a separate credentials file. In practice this would be the same (when working with .env files), so I don't really think this matters.

For the database connections we would expect information to be duplicate (e.g., where it server is located). However, toml does not support defaults natively, which is why I added a defaults subtable which propagates its values to sibling tables. I doubt that the added complexity is warranted (at this point), but I wanted to make it easier for myself to switch between the my local test server snapshot and the production server snapshot (that will be hosted in k8s).

I also think it might be useful to parse the configuration into some Pydantic class in the future to catch invalid configurations earlier and, for non-database configurations which may be used more frequently in the program, to access configurations through typed objects rather than dictionaries to aid IDE features.

josvandervelde

Looks good to me, nice implementation!

Just some thoughts: if we'll ever want to use docker-compose, we probably want to be able to read the credentials and some configured variables inside the docker-compose.yaml (e.g. hostname and credentials of db). To be able to access to such variables, they should, as far as I know, be environment variables.

Currently, both the default-credentials and the other variables wouldn't be accessible inside docker-compose.yaml. With that in mind, would it be better to have different .env files, instead of the config.toml and the default values specified in config.py? We could something in the line of:

default.env
default-credentials.env
[profile].env
[profile]-credentials.env
Where the profile could be a cmdline arg?

An additional reason to do this, would be to have similar mechanisms for all config-definition. They'll all be .env files, instead of some info in .env, some in .toml, some defaults in .py. A con would be that we'd have to resort to env vars without any hierarchy (as we do have in .toml).

Thoughts?

PGijsbers · 2023-10-18T12:15:44Z

Discussed this on the talk but I had written some stuff down already, so for posterity:

It seems that the correct way to mount secrets into docker-compose is through use of the secrets mechanism. This writes contents of a file on the host to a temporary /run/secrets/ directory. Security-wise, there doesn't seem to be a big difference other than not accidentally exposing credentials if someone dumps all environment variables.

You could specify a .toml file as a secret. However, I don't think you can use the content of the .toml secret within the compose.yaml definitions, whereas you could with environment variables or single-line file secrets (single-line files seem to be supported as a way to provide credentials to e.g., a MySQL server).

So it seems that .env has the advantage of being able to be read in a docker compose step, whereas .toml makes for more readable configurations.

I don't see a problem with using two distinct mechanisms (one for credentials and one for all other configuration), but it's not immediately clear to me which configuration file we should prefer (.env or .toml).

PGijsbers · 2023-10-18T12:41:48Z

I am going to go ahead and merge this PR, so we have something to work with. We will revisit this when get to integrating services through docker compose (soon).

PGijsbers added 7 commits October 12, 2023 14:17

Read database configuration from file

c90a26b

Add python_dotenv to load environment variables from .env files

30a9a3d

Load database credentials from environment variables

af8650e

Only start app if needed, add config file path fixture

9c2999d

Don't automatically load configuration on import

a3aaa71

Add minimal tests for configuration loading

d99170b

Refactor to work with sibligns to make function more intuitive

edfabae

PGijsbers added the enhancement New feature or request label Oct 12, 2023

PGijsbers requested a review from josvandervelde October 12, 2023 14:42

PGijsbers mentioned this pull request Oct 16, 2023

Add documentation pages for installing and local development #74

Merged

5 tasks

josvandervelde approved these changes Oct 18, 2023

View reviewed changes

PGijsbers merged commit 8219ec8 into main Oct 18, 2023

PGijsbers deleted the feature/configurable_database branch October 18, 2023 12:42

PGijsbers mentioned this pull request Oct 18, 2023

Add docker-compose for spinning up docker instances #76

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/configurable database#71

Feature/configurable database#71
PGijsbers merged 7 commits intomainfrom
feature/configurable_database

PGijsbers commented Oct 12, 2023

Uh oh!

josvandervelde left a comment

Uh oh!

PGijsbers commented Oct 18, 2023

Uh oh!

PGijsbers commented Oct 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

PGijsbers commented Oct 12, 2023

Uh oh!

josvandervelde left a comment

Choose a reason for hiding this comment

Uh oh!

PGijsbers commented Oct 18, 2023

Uh oh!

PGijsbers commented Oct 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants