Conversation
josvandervelde
left a comment
There was a problem hiding this comment.
Looks good to me, nice implementation!
Just some thoughts: if we'll ever want to use docker-compose, we probably want to be able to read the credentials and some configured variables inside the docker-compose.yaml (e.g. hostname and credentials of db). To be able to access to such variables, they should, as far as I know, be environment variables.
Currently, both the default-credentials and the other variables wouldn't be accessible inside docker-compose.yaml. With that in mind, would it be better to have different .env files, instead of the config.toml and the default values specified in config.py? We could something in the line of:
default.envdefault-credentials.env[profile].env[profile]-credentials.env
Where theprofilecould be a cmdline arg?
An additional reason to do this, would be to have similar mechanisms for all config-definition. They'll all be .env files, instead of some info in .env, some in .toml, some defaults in .py. A con would be that we'd have to resort to env vars without any hierarchy (as we do have in .toml).
Thoughts?
|
Discussed this on the talk but I had written some stuff down already, so for posterity: It seems that the correct way to mount secrets into docker-compose is through use of the secrets mechanism. This writes contents of a file on the host to a temporary You could specify a So it seems that I don't see a problem with using two distinct mechanisms (one for credentials and one for all other configuration), but it's not immediately clear to me which configuration file we should prefer ( |
|
I am going to go ahead and merge this PR, so we have something to work with. We will revisit this when get to integrating services through docker compose (soon). |
This PR makes the database connection configurable through a configuration file (currently located at
src/config.toml). The database credentials must be set through environment variables instead (which may be in a.envfile). I am not happy with this implementation, but figured I would share for input.We need our connections to be configurable as this will make it easier to set up in different environments (e.g., different developers, or staging vs production). For a lot of these values, we can provide sensible defaults which will work with the default development environment. Providing these defaults through an example configuration file makes it immediately easy to understand what is configurable.
It would then also be convenient to provide an easy way to override the configuration of the shipped configuration, which could be either through environment variables, or a separate configuration file (for example, by default located in something like
~/.config/openml-serveror a file explicitly passed as command line argument). This is currently not supported.I also excluded credentials from this file, since I do not want people to accidentally share credentials. This is why I put them in environment variables, but we could also work with a separate credentials file. In practice this would be the same (when working with
.envfiles), so I don't really think this matters.For the database connections we would expect information to be duplicate (e.g., where it server is located). However,
tomldoes not support defaults natively, which is why I added adefaultssubtable which propagates its values to sibling tables. I doubt that the added complexity is warranted (at this point), but I wanted to make it easier for myself to switch between the my local test server snapshot and the production server snapshot (that will be hosted in k8s).I also think it might be useful to parse the configuration into some Pydantic class in the future to catch invalid configurations earlier and, for non-database configurations which may be used more frequently in the program, to access configurations through typed objects rather than dictionaries to aid IDE features.