Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
59b73cf
Add mkdocs-material minimal example
PGijsbers Oct 13, 2023
f0be89e
Replace minimal example with README and CONTRIBUTING
PGijsbers Oct 13, 2023
dd01c44
Split up readme into github readme and documentation pages
PGijsbers Oct 13, 2023
cda0a98
Improve installation instructions
PGijsbers Oct 16, 2023
5ae72c8
Add information on YAML validation
PGijsbers Oct 16, 2023
3a81870
Add info for public database and building documentation
PGijsbers Oct 16, 2023
1c6888d
Remove code that was outside of codeblock
PGijsbers Oct 16, 2023
083321c
Add github workflow to deploying docs
PGijsbers Oct 16, 2023
c6fb107
Add source for workflow for later reference
PGijsbers Oct 16, 2023
88b19c8
Add minimal welcome page
PGijsbers Oct 16, 2023
ce6d28c
Add project information
PGijsbers Oct 16, 2023
6cc2d19
Further clarify that this is not REST API documentation itself
PGijsbers Oct 16, 2023
1f1b998
Use code-block titles
PGijsbers Oct 16, 2023
4fac239
Add Dockerfile for setting up PHP server
PGijsbers Oct 17, 2023
bff897c
Move documentation around, add section index, mark incomplete info
PGijsbers Oct 18, 2023
b2af709
Merge branch 'main' into add/docs
PGijsbers Oct 18, 2023
6c19dd3
Start dockerizing the test server database
PGijsbers Oct 19, 2023
b23181e
Add usage documentation for the docker image
PGijsbers Oct 19, 2023
846a89b
Add additional files for building PHP docker container
PGijsbers Oct 19, 2023
8842819
Rename image and add brief updated README
PGijsbers Oct 19, 2023
9c9789e
Move files up one level
PGijsbers Oct 19, 2023
af4096b
Add docker compose file to automatically spin up db and php api
PGijsbers Oct 19, 2023
9012cf5
Minor text clarifications
PGijsbers Oct 20, 2023
56eb32d
Bump to 3.12
PGijsbers Oct 20, 2023
de28106
Move php and database docker files to docker directory
PGijsbers Oct 20, 2023
f31eddd
Move stub files
PGijsbers Oct 20, 2023
ad048b2
Add docs service for serving mkdocs documentation
PGijsbers Oct 20, 2023
9007610
Add Dockerfile for running Python-based REST API
PGijsbers Oct 20, 2023
fc037f8
Set ignored files for docker builds
PGijsbers Oct 20, 2023
6a61982
Correctly load database configuration for users database
PGijsbers Oct 20, 2023
8bc78a9
Old server serves http minio instead of https
PGijsbers Oct 20, 2023
ddc8fe6
Change connection address, minio can now be compared
PGijsbers Oct 20, 2023
5cf7548
URL is no longer str, needs explicit cast
PGijsbers Oct 20, 2023
2e4caa2
Only 131 datasets in the test database
PGijsbers Oct 20, 2023
d9ca0ea
Add test database setup instructions
PGijsbers Oct 20, 2023
48dd659
Add information about using docker compose to run services
PGijsbers Oct 20, 2023
e160343
Add instructions to contribute documentation changes
PGijsbers Oct 20, 2023
53f0744
Add emoji extension
PGijsbers Oct 20, 2023
b384433
Add general contribution introduction
PGijsbers Oct 20, 2023
78f6474
Make sure all documentation information is in dedicated section
PGijsbers Oct 20, 2023
9aafcdc
Add mkdocs-section-index requirement
PGijsbers Oct 20, 2023
1f8aabe
Fix typo in site_url
PGijsbers Oct 20, 2023
74cf33c
Minor text changes and corrections
PGijsbers Oct 21, 2023
eafce20
Make codeblocks copyable
PGijsbers Oct 21, 2023
8a37572
Add project overview and move roadmap content
PGijsbers Oct 21, 2023
2919fe2
Update unit tests to be more strict
PGijsbers Oct 31, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
.github
.mypy_cache
.pytest_cache
.ruff_cache
venv
28 changes: 28 additions & 0 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# https://squidfunk.github.io/mkdocs-material/publishing-your-site/?h=#with-github-actions
name: Deploy Docs
on:
push:
branches:
- "add/docs"
- main

permissions:
contents: write

jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v4
with:
python-version: 3.x
- run: echo "cache_id=$(date --utc '+%V')" >> $GITHUB_ENV
- uses: actions/cache@v3
with:
key: mkdocs-material-${{ env.cache_id }}
path: .cache
restore-keys: |
mkdocs-material-
- run: pip install mkdocs-material mkdocs-section-index
- run: mkdocs gh-deploy --force
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
docker/mysql/data

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down
2 changes: 1 addition & 1 deletion .python-version
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.11
3.12
54 changes: 0 additions & 54 deletions CONTRIBUTING.md

This file was deleted.

71 changes: 13 additions & 58 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,62 +1,17 @@
# server
Python-based server prototype
![Python 3.12](https://img.shields.io/badge/python-3.12-green?logo=python)

## Development Roadmap
First we will mimic current server functionality, relying on many implementation details
present in the current production server:
# OpenML Server
This is the Python-based OpenML REST API server.
It's a rewrite of our [old backend](http://github.com/openml/openml) built with a
modern Python-based stack.

- Implement all GET endpoints using the SQL text queries based on PHP implementation,
which should give near-identical responses to the current JSON endpoints. Minor
exceptions are permitted but will be documented.
- Implement non-GET endpoints in similar fashion.
> [!WARNING]
> This software is in early stages of development and not ready for production.

At the same time we may also provide a work-in-progress "new" endpoint, but there won't
be official support for it at this stage. After we verify the output of the endpoints
are identical (minus any intentional documented differences), we will officially release
the new API. The old API will remain available. After that, we can start working on a
new version of the JSON API which is more standardized, leverages typing, and so on:
If you simply want to access data stored on OpenML in a programmatic way,
please have a look at connector packages in
[Python](https://openml.github.io/openml-python/main/),
[Java](https://github.com/openml/openml-java),
or [R](http://openml.github.io/openml-r/).

- Clean up the database: standardize value formats where possible (e.g., (un)quoting
contributor names in the dataset's contributor field), and add database level
constraints on new values.
- Redesign what the new API responses should look like and implement them,
API will be available to the public as it is developed.
- Refactor code-base to use ORM (using `SQLAlchemy`, `SQLModel`, or similar).
- Officially release the modernized API.

There is no planned sunset date for the old API. This will depend on the progress with
the new API as well as the usage numbers of the old API.

## Change Notes
The first iteration of the new server has nearly identical responses to the old JSON
endpoints, but there are exceptions:

- Providing input of invalid types (e.g., a non-integer dataset id).

HTTP Header:
```diff
- 412 Precondition Failed
+ 422 Unprocessable Entity
```

JSON Content
```diff
- {"error":{"code":"100","message":"Function not valid"}}
+ {"detail":[{"loc":["query","_dataset_id"],"msg":"value is not a valid integer","type":"type_error.integer"}]}
```

- For any other error messages, the response is identical except that outer field
will be `"detail"` instead of `"error"`:
```diff
- {"error":{"code":"112","message":"No access granted"}}
+ {"detail":{"code":"112","message":"No access granted"}}
```

- Dataset format names are normalized to be all lower-case
(`"Sparse_ARFF"` -> `"sparse_arff"`).
- Non-`arff` datasets will not incorrectly have a `"parquet_ur"`:
https://github.com/openml/OpenML/issues/1189
- If `"creator"` contains multiple comma-separated creators it is always returned
as a list, instead of it depending on the quotation used by the original uploader.
- For (some?) datasets that have multiple values in `"ignore_attribute"`, this field
is correctly populated instead of omitted.
For information on getting started, please visit our [documentation](https://openml.github.io/server-api).
32 changes: 32 additions & 0 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
services:
database:
image: "openml/test-database"
container_name: "openml-test-database"
environment:
MYSQL_ROOT_PASSWORD: ok
ports:
- "3306:3306"

docs:
build:
context: .
dockerfile: docker/docs/Dockerfile
ports:
- "8000:8000"
volumes:
- .:/docs

php-api:
image: "openml/php-rest-api"
ports:
- "8002:80"

python-api:
container_name: "python-api"
build:
context: .
dockerfile: docker/python/Dockerfile
ports:
- "8001:8000"
volumes:
- .:/python-api
4 changes: 4 additions & 0 deletions docker/docs/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
FROM squidfunk/mkdocs-material
RUN python -m pip install mkdocs-section-index
ENTRYPOINT ["mkdocs"]
CMD ["serve", "--dev-addr=0.0.0.0:8000"]
3 changes: 3 additions & 0 deletions docker/mysql/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
FROM mysql

COPY ./data /docker-entrypoint-initdb.d
65 changes: 65 additions & 0 deletions docker/mysql/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Test Database

The test database image is simply a [MySql image](https://hub.docker.com/_/mysql/) with
data already present. For general usage, such as setting a password or persisting data
to disk, see the linked MySQL image documentation.

The following command starts the database container:

```bash
docker run -e MYSQL_ROOT_PASSWORD=ok -p 3306:3306 -d --name testdb openml/test-database:latest
```
which sets:

- `-e MYSQL_ROOT_PASSWORD=ok`: the root password is 'ok'
- `-p 3306:3306`: makes the database accessible in the host on port 3306

You should be able to connect to it using `mysql`:
```bash

```
If you do not have `mysql` installed, you may refer to the MySQL image documentation on
how to use the image instead to connect over a docker network if you want to connect
with `mysql`.

The test database the following special users:

| id | API key | Comments |
| -- | -- | -- |
| 1 | AD000000000000000000000000000000 | Administrator rights |
| 2 | 00000000000000000000000000000000 | Normal user |
| 16 | DA1A0000000000000000000000000000 | Normal user with private dataset with id 130 |


## Creating the `openml/test-database` image

The following steps were taken to create the image:

1. Create a dump for the current test database:

```text
mysqldump -u root --add-drop-database --databases openml_expdb --result-file=openml_expdb.sql -p
mysqldump -u root --add-drop-database --databases openml --ignore_table=openml.login_attempts --result-file=openml.sql -p
```

`login_attempts` is a legacy table which is not used in production but has a few rows in the current test database.

2. Copy over the files to the local directory:

```bash
scp USERNAME@test.openml.org:/path/to/openml-anonimized.sql data/openml.sql
scp USERNAME@test.openml.org:/path/to/openml_expdb.sql data/openml_expdb.sql
```

3. Anonimize the sensitive information from the openml database:
```text
python openml-kube/k8s_manifests/mysql/migration/anonimize-openml-db.py --input=openml.sql
```
This produces `openml-anonimized.sql` which has user data replaced by fake data.

4. Build and publish the docker image:

```bash
docker build --tag openml/test-database:latest -f Dockerfile .
docker push openml/test-database:latest
```
1 change: 1 addition & 0 deletions docker/mysql/data/openml.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# stub, see readme on how to generate this file.
1 change: 1 addition & 0 deletions docker/mysql/data/openml_expdb.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# stub, see readme on how to generate this file.
25 changes: 25 additions & 0 deletions docker/php/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
FROM php:7.4.33-apache

RUN docker-php-source extract \
&& docker-php-ext-install mysqli \
&& docker-php-source delete

RUN apt-get update \
&& apt-get install -y git \
&& git clone https://github.com/openml/openml /var/www/openml

RUN mv /var/www/openml/openml_OS/config/BASE_CONFIG-BLANK.php /var/www/openml/openml_OS/config/BASE_CONFIG.php

RUN mkdir /var/www/openml/logs
RUN mkdir /data


COPY config/*.load /etc/apache2/mods-enabled/
COPY config/api.conf /etc/apache2/sites-enabled/000-default.conf
COPY config/php.ini /user/local/etc/php/
COPY config/.htaccess /var/www/openml/.htaccess

RUN mkdir /scripts
COPY set_configuration.sh /scripts/

ENTRYPOINT ["bash", "/scripts/set_configuration.sh"]
21 changes: 21 additions & 0 deletions docker/php/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Running apache php backend locally

In most cases, you probably want to run through docker compose.
This file contains instructions for running it on its own.

```bash
docker run -p 8001:80 --rm -it openml/php-rest-api
```

Runs the PHP REST API server and exposes it to `http://localhost:8001/`.
Some `BASE_CONFIG.php` variables can be overwritten with environment variables,
these can be passed in the run command with the `-e` option, e.g.: `-e BASE_URL=http://localhost/`.
See `set_configuration.sh` for the variables which can be overwritten out-of-the-box.
Alternatively, mount your own `BASE_CONFIG.php` into the container at `/var/www/openml/openml_OS/config/BASE_CONFIG.php`.
The `set_configuration.sh` script will only overwrite unset variables.
To avoid overwriting altogether, also change the entrypoint: `--entrypoint=apache2-foreground`.

To connect to a separate container running a MySQL server, they need to be on the same docker network.
For both, specify the network with `--network NETWORK_NAME`, which can be any network you create with `docker network create NETWORK_NAME`.
Assuming a connection to the database can be established, to get a dataset description go to `http://127.0.0.1:8001/api/v1/json/data/1`.
Note that the protocol is `http` not `https`.
3 changes: 3 additions & 0 deletions docker/php/build_docker.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/bin/bash

docker build --tag openml/php-rest-api -f Dockerfile .
17 changes: 17 additions & 0 deletions docker/php/config/.htaccess
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
RewriteEngine on

# TODO: specific for main instance of OpenML site. Should do something better
RewriteCond %{HTTP_HOST} ^api_new.openml.org
RewriteRule ^(.*)$ http://www.openml.org/api_new/$1 [L,P]

RewriteCond %{HTTPS_HOST} ^api_new.openml.org
RewriteRule ^(.*)$ https://www.openml.org/api_new/$1 [L,P]

RewriteCond $1 !^(questions|SWF|img|docs|downloads|GFX|favicon\.ico|tiny_mce|index\.php|js|css|robots\.txt)
RewriteRule ^(.*)$ index.php/$1 [L]


<IfModule mod_headers.c>
Header set Access-Control-Allow-Origin "*"
Header set Access-Control-Allow-Headers "Origin, X-Requested-With, Content-Type, Accept"
</IfModule>
Loading