Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
cc9d4e8
test: remove outdated snapshot tests
lars-reimann Jan 8, 2025
43c3908
chore: check attributes of `Table`
lars-reimann Jan 8, 2025
1e192dd
chore: check `from_columns`
lars-reimann Jan 8, 2025
2602cd2
chore: check `__init__`
lars-reimann Jan 8, 2025
ef1cbb8
chore: check `from_csv_file`
lars-reimann Jan 8, 2025
e8c04d8
chore: check `from_dict`
lars-reimann Jan 8, 2025
5b4f344
chore: check `from_json_file`
lars-reimann Jan 8, 2025
b30ed11
test: add missing tests for `from_parquet_file`
lars-reimann Jan 8, 2025
b51d10b
chore: check `count_rows_if`
lars-reimann Jan 8, 2025
831cec7
test: cover missing branches of `add_columns`
lars-reimann Jan 8, 2025
4114de8
test: cover missing branches of `add_table_as_columns`
lars-reimann Jan 8, 2025
01ec74a
test: add tests for `add_computed_column`
lars-reimann Jan 9, 2025
4ec22cf
test: add tests for `add_table_as_rows`
lars-reimann Jan 9, 2025
4b3ad51
chore: check `get_column`
lars-reimann Jan 9, 2025
d75c12a
chore: check `get_column_type`
lars-reimann Jan 9, 2025
25b951f
chore: check `has_column`
lars-reimann Jan 9, 2025
55dc19c
chore: check `remove_columns`
lars-reimann Jan 9, 2025
a52a768
chore: check `remove_columns_except`
lars-reimann Jan 9, 2025
3a8303a
chore: check `rename_column`
lars-reimann Jan 9, 2025
250a349
chore: check `replace_column`
lars-reimann Jan 9, 2025
c2a8285
chore: check `remove_duplicate_rows`
lars-reimann Jan 9, 2025
edc8f73
chore: check `slice_rows`
lars-reimann Jan 9, 2025
35f24f3
chore: check `sort_rows`
lars-reimann Jan 9, 2025
cbd0e96
test: add tests for `sort_rows_by_column`
lars-reimann Jan 9, 2025
deec4ed
chore: check `shuffle_rows`
lars-reimann Jan 9, 2025
1e33467
chore: check `split_rows`
lars-reimann Jan 9, 2025
2de8ba3
chore: check `to_columns`
lars-reimann Jan 9, 2025
99f00c5
chore: check `to_dict`
lars-reimann Jan 9, 2025
2d71068
chore: check `__str__`
lars-reimann Jan 9, 2025
6c57fa0
chore: check `__repr__`
lars-reimann Jan 9, 2025
07b9fc2
chore: check `remove_columns_with_missing_values`
lars-reimann Jan 9, 2025
175eab5
feat: rename `remove_columns_except` to `select_columns`
lars-reimann Jan 9, 2025
9a86019
feat: select columns by predicate
lars-reimann Jan 9, 2025
2c0e15b
docs: point to related API elements
lars-reimann Jan 10, 2025
691f6a9
chore: check `remove_rows_by_column`
lars-reimann Jan 10, 2025
3ffb636
feat: add `filter_rows` and `filter_rows_by_column`
lars-reimann Jan 10, 2025
39bf344
docs: add raises sections
lars-reimann Jan 10, 2025
a361a09
docs: import errors if type checking, so mkdocstrings links them prop…
lars-reimann Jan 10, 2025
e5ff9c5
feat: make some parameters keyword-only
lars-reimann Jan 10, 2025
5285a02
chore: check `remove_non_numeric_columns`
lars-reimann Jan 10, 2025
8001f8c
style: fix ruff errors
lars-reimann Jan 10, 2025
5ace729
fix: mypy errors
lars-reimann Jan 10, 2025
9086444
chore: check table creation from polars objects
lars-reimann Jan 10, 2025
71e01ed
chore: check `__eq__`
lars-reimann Jan 10, 2025
4264fdc
chore: check `__sizeof__`
lars-reimann Jan 10, 2025
d868229
chore: check `__hash__`
lars-reimann Jan 10, 2025
87a6044
chore: check `_repr_html_`
lars-reimann Jan 10, 2025
4b36e17
chore: check `remove_rows_with_missing_values`
lars-reimann Jan 10, 2025
bf4a31d
chore: check `remove_rows_with_outliers`
lars-reimann Jan 10, 2025
75976d3
chore: check `transform_column`
lars-reimann Jan 10, 2025
c49a48e
chore: check method that create files
lars-reimann Jan 10, 2025
dde0b13
refactor: replace `TransformerNotFittedError` with `NotFittedError`
lars-reimann Jan 10, 2025
da4acdb
refactor: replace `ModelNotFittedError` with `NotFittedError`
lars-reimann Jan 10, 2025
079a1d4
refactor: replace `TransformerNotInvertibleError` with `NotInvertible…
lars-reimann Jan 10, 2025
050574e
feat: new method `add_index_column`
lars-reimann Jan 11, 2025
00b0191
feat: instantiate column types
lars-reimann Jan 11, 2025
3c2b731
feat: finalize Schema class
lars-reimann Jan 11, 2025
1295f47
docs: improve wording
lars-reimann Jan 11, 2025
e79ed16
chore: check `join`
lars-reimann Jan 11, 2025
36bf834
test: `add_table_as_xy` do not mutate other table
lars-reimann Jan 12, 2025
1a96eac
chore: rename `seed` to `random_seed` to make the roel of the paramet…
lars-reimann Jan 12, 2025
13453ef
chore: check `summarize_statistics`
lars-reimann Jan 12, 2025
c1e8696
perf: greatly accelerate `Table.summarize_statistics` method
lars-reimann Jan 12, 2025
7e5ba16
refactor: call `Table.summarize_statistics` in `Column.summarize_stat…
lars-reimann Jan 12, 2025
67d8bd3
feat: add multiple tables as columns or rows at once
lars-reimann Jan 12, 2025
9dc24db
chore: minor changes
lars-reimann Jan 12, 2025
790cbc7
test: `to_tabular_dataset`
lars-reimann Jan 12, 2025
b97e466
feat: remove `Table.to_time_series_dataset` for now
lars-reimann Jan 12, 2025
46d5309
test: fix failing tests
lars-reimann Jan 12, 2025
adef8c2
test: fix failing doctests
lars-reimann Jan 12, 2025
e41ed67
feat: raise validation errors from `None` for cleaner stack traces
lars-reimann Jan 12, 2025
a9ab762
docs: fix tutorials
lars-reimann Jan 12, 2025
3c94f45
chore: remove now unnecessary check
lars-reimann Jan 12, 2025
4f352fb
test: ignore a line for coverage
lars-reimann Jan 12, 2025
7e9e2c1
style: reformat files
lars-reimann Jan 12, 2025
d53ad0c
perf: speed up `remove_columns_with_missing_values`
lars-reimann Jan 12, 2025
f1d73d0
style: fix ruff errors
lars-reimann Jan 12, 2025
c973ead
style: apply automated linter fixes
megalinter-bot Jan 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
4 changes: 2 additions & 2 deletions benchmarks/metrics/classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@
from timeit import timeit

import polars as pl
from safeds.data.tabular.containers import Table
from safeds.ml.metrics import ClassificationMetrics

from benchmarks.table.utils import create_synthetic_table
from safeds.data.tabular.containers import Table
from safeds.ml.metrics import ClassificationMetrics

REPETITIONS = 10

Expand Down
3 changes: 1 addition & 2 deletions benchmarks/table/column_operations.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
from timeit import timeit

from safeds.data.tabular.containers import Table

from benchmarks.table.utils import create_synthetic_table
from safeds.data.tabular.containers import Table

REPETITIONS = 10

Expand Down
102 changes: 65 additions & 37 deletions docs/tutorials/classification.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,10 @@
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"In this tutorial, we use `safeds` on **Titanic passenger data** to predict who will survive and who will not."
Expand All @@ -12,7 +15,10 @@
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"### Loading Data\n",
Expand All @@ -23,7 +29,10 @@
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
Expand Down Expand Up @@ -75,7 +84,7 @@
"from safeds.data.tabular.containers import Table\n",
"\n",
"raw_data = Table.from_csv_file(\"data/titanic.csv\")\n",
"#For visualisation purposes we only print out the first 15 rows.\n",
"# For visualisation purposes we only print out the first 15 rows.\n",
"raw_data.slice_rows(length=15)"
]
},
Expand Down Expand Up @@ -169,18 +178,18 @@
"source": [
"We remove certain columns for the following reasons:\n",
"1. **high idness**: `id` , `ticket`\n",
"2. **high stability**: `parents_children` \n",
"2. **high stability**: `parents_children`\n",
"3. **high missing value ratio**: `cabin`"
]
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"train_table = train_table.remove_columns([\"id\",\"ticket\", \"parents_children\", \"cabin\"])\n",
"test_table = test_table.remove_columns([\"id\",\"ticket\", \"parents_children\", \"cabin\"])"
"train_table = train_table.remove_columns([\"id\", \"ticket\", \"parents_children\", \"cabin\"])\n",
"test_table = test_table.remove_columns([\"id\", \"ticket\", \"parents_children\", \"cabin\"])"
]
},
{
Expand All @@ -199,15 +208,18 @@
"source": [
"from safeds.data.tabular.transformation import SimpleImputer\n",
"\n",
"simple_imputer = SimpleImputer(column_names=[\"age\",\"fare\"],strategy=SimpleImputer.Strategy.mean())\n",
"simple_imputer = SimpleImputer(column_names=[\"age\", \"fare\"], strategy=SimpleImputer.Strategy.mean())\n",
"fitted_simple_imputer_train, transformed_train_data = simple_imputer.fit_and_transform(train_table)\n",
"transformed_test_data = fitted_simple_imputer_train.transform(test_table)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"### Handling Nominal Categorical Data\n",
Expand All @@ -219,13 +231,18 @@
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"from safeds.data.tabular.transformation import OneHotEncoder\n",
"\n",
"fitted_one_hot_encoder_train, transformed_train_data = OneHotEncoder(column_names=[\"sex\", \"port_embarked\"]).fit_and_transform(transformed_train_data)\n",
"fitted_one_hot_encoder_train, transformed_train_data = OneHotEncoder(\n",
" column_names=[\"sex\", \"port_embarked\"],\n",
").fit_and_transform(transformed_train_data)\n",
"transformed_test_data = fitted_one_hot_encoder_train.transform(transformed_test_data)"
]
},
Expand Down Expand Up @@ -299,7 +316,10 @@
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"### Marking the Target Column\n",
Expand All @@ -314,17 +334,23 @@
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"tagged_train_table = transformed_train_data.to_tabular_dataset(\"survived\",extra_names=[\"name\"])"
"tagged_train_table = transformed_train_data.to_tabular_dataset(\"survived\", extra_names=[\"name\"])"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"### Fitting a Classifier\n",
Expand All @@ -335,7 +361,10 @@
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
Expand All @@ -348,7 +377,10 @@
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"### Predicting with the Classifier\n",
Expand All @@ -360,7 +392,10 @@
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
Expand Down Expand Up @@ -433,14 +468,17 @@
],
"source": [
"reverse_transformed_prediction = prediction.to_table().inverse_transform_table(fitted_one_hot_encoder_train)\n",
"#For visualisation purposes we only print out the first 15 rows.\n",
"# For visualisation purposes we only print out the first 15 rows.\n",
"reverse_transformed_prediction.slice_rows(length=15)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"### Testing the Accuracy of the Model\n",
Expand All @@ -449,28 +487,18 @@
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy on test data: 79.3893%\n"
]
}
],
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"accuracy = fitted_classifier.accuracy(transformed_test_data) * 100\n",
"print(f'Accuracy on test data: {accuracy:.4f}%')"
"f\"Accuracy on test data: {accuracy:.4f}%\""
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -488,5 +516,5 @@
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 4
}
Loading