Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
dc249d3
feat: move `StringCell` and `TemporalCell` classes into `query` package
lars-reimann Jan 15, 2025
35cea66
feat: rename `StringCell` to `StringOperations` and `TemporalCell` to…
lars-reimann Jan 15, 2025
160c334
chore: more readable output of `_LazyCell.__repr__`
lars-reimann Jan 15, 2025
fd3f1c1
chore: add `__eq__`, `__repr__`, and `__str__` methods
lars-reimann Jan 15, 2025
5883d54
chore: check `__sizeof__`
lars-reimann Jan 15, 2025
ace9ee8
chore: check `__hash__`
lars-reimann Jan 15, 2025
4227174
chore: check `__eq__`
lars-reimann Jan 15, 2025
3788c78
feat: remove unneeded trailing underscores from parameter names of pu…
lars-reimann Jan 16, 2025
c423d35
feat: update namespaces for cell operations
lars-reimann Jan 16, 2025
7a791bd
feat: test dunder methods of `_LazyDurationOperations`
lars-reimann Jan 16, 2025
401f368
feat: test dunder methods of `_LazyMathOperations`
lars-reimann Jan 16, 2025
c69e088
feat: test dunder methods of `_LazyDatetimeOperations`
lars-reimann Jan 16, 2025
269cd21
test: remove some unneeded tests
lars-reimann Jan 16, 2025
5324519
docs: include summary
lars-reimann Jan 16, 2025
c5bb2af
docs: use list instead of table (more readable in IDE)
lars-reimann Jan 16, 2025
dc9ce74
docs: document parameters and results of cell, so they show up in the…
lars-reimann Jan 16, 2025
de4026d
docs: use raw text for method name for consistency
lars-reimann Jan 16, 2025
4ad2e45
feat: duration operations
lars-reimann Jan 16, 2025
cac8bb0
WIP
lars-reimann Jan 16, 2025
4e9def3
docs: update outputs
lars-reimann Jan 16, 2025
815eb46
docs: update outputs
lars-reimann Jan 16, 2025
614b180
feat: only one method to convert datetime values to string
lars-reimann Jan 16, 2025
0c910a6
feat: add more datetime operations
lars-reimann Jan 16, 2025
1a3b123
feat: add `replace` method
lars-reimann Jan 16, 2025
edd4b58
refactor: extract method to get similar strings
lars-reimann Jan 18, 2025
dff8a5b
feat: custom format strings for datetimes
lars-reimann Jan 18, 2025
88bae5f
docs: add documentation for `dt.to_string` method
lars-reimann Jan 18, 2025
ef8d2b9
test: `unix_timestamp`
lars-reimann Jan 19, 2025
8e2f40c
test: `is_in_leap_year`
lars-reimann Jan 19, 2025
2ff0e92
test: `year`
lars-reimann Jan 19, 2025
5e3d2b6
test: `month`
lars-reimann Jan 19, 2025
d6ddde6
test: `day`
lars-reimann Jan 19, 2025
0410538
test: `date`
lars-reimann Jan 19, 2025
7d60af4
test: `day_of_week`
lars-reimann Jan 19, 2025
7db6299
test: `day_of_year`
lars-reimann Jan 19, 2025
9952ae5
test: `century`
lars-reimann Jan 19, 2025
8152da2
test: `millennium`
lars-reimann Jan 19, 2025
d3b49c2
test: `quarter`
lars-reimann Jan 19, 2025
c260e01
test: `week`
lars-reimann Jan 19, 2025
bdbbcd5
test: `second`
lars-reimann Jan 19, 2025
1385093
test: `minute`
lars-reimann Jan 19, 2025
fe9c868
test: `hour`
lars-reimann Jan 19, 2025
c29f7a4
test: `microsecond`
lars-reimann Jan 19, 2025
28d78ff
test: `millisecond`
lars-reimann Jan 19, 2025
e29c00a
test: `replace`
lars-reimann Jan 19, 2025
de11c55
test: first tests for `to_string`
lars-reimann Jan 19, 2025
1bda757
feat: specify time zone of ColumnType.datetime
lars-reimann Jan 19, 2025
143bd18
feat: specify time zone for `Cell.datetime`
lars-reimann Jan 19, 2025
bfcafd1
refactor: store LazyFrame in Column
lars-reimann Jan 19, 2025
6786c2b
refactor: create Column from a LazyFrame
lars-reimann Jan 19, 2025
e8a5f9e
perf: make some operations lazy
lars-reimann Jan 19, 2025
a7a4ad1
test: `dt.to_string`
lars-reimann Jan 19, 2025
47930e1
style: fix mypy errors
lars-reimann Jan 19, 2025
b9acd7f
fix: wrong import
lars-reimann Jan 19, 2025
bd9a0b2
fix: failing tests
lars-reimann Jan 19, 2025
209807e
style: apply automated linter fixes
megalinter-bot Jan 19, 2025
9af7503
test: add missing tests
lars-reimann Jan 19, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 4 additions & 54 deletions docs/tutorials/data_processing.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -688,64 +688,14 @@
]
},
{
"metadata": {},
"cell_type": "code",
"execution_count": 11,
"metadata": {
"ExecuteTime": {
"end_time": "2024-05-24T11:02:33.599165800Z",
"start_time": "2024-05-24T11:02:33.479893800Z"
},
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div><style>\n",
".dataframe > thead > tr,\n",
".dataframe > tbody > tr {\n",
" text-align: right;\n",
" white-space: pre-wrap;\n",
"}\n",
"</style>\n",
"<small>shape: (10, 12)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>id</th><th>name</th><th>sex</th><th>age</th><th>siblings_spouses</th><th>parents_children</th><th>ticket</th><th>travel_class</th><th>fare</th><th>cabin</th><th>port_embarked</th><th>survived</th></tr><tr><td>i64</td><td>str</td><td>str</td><td>f64</td><td>i64</td><td>i64</td><td>str</td><td>i64</td><td>f64</td><td>str</td><td>str</td><td>i64</td></tr></thead><tbody><tr><td>0</td><td>&quot;Abbing, Mr. Anthony&quot;</td><td>&quot;male&quot;</td><td>0.524008</td><td>0</td><td>0</td><td>&quot;C.A. 5547&quot;</td><td>3</td><td>7.55</td><td>null</td><td>&quot;Southampton&quot;</td><td>0</td></tr><tr><td>1</td><td>&quot;Abbott, Master. Eugene Joseph&quot;</td><td>&quot;male&quot;</td><td>0.160751</td><td>0</td><td>2</td><td>&quot;C.A. 2673&quot;</td><td>3</td><td>20.25</td><td>null</td><td>&quot;Southampton&quot;</td><td>0</td></tr><tr><td>2</td><td>&quot;Abbott, Mr. Rossmore Edward&quot;</td><td>&quot;male&quot;</td><td>0.19833</td><td>1</td><td>1</td><td>&quot;C.A. 2673&quot;</td><td>3</td><td>20.25</td><td>null</td><td>&quot;Southampton&quot;</td><td>0</td></tr><tr><td>3</td><td>&quot;Abbott, Mrs. Stanton (Rosa Hun…</td><td>&quot;female&quot;</td><td>0.436325</td><td>1</td><td>1</td><td>&quot;C.A. 2673&quot;</td><td>3</td><td>20.25</td><td>null</td><td>&quot;Southampton&quot;</td><td>1</td></tr><tr><td>4</td><td>&quot;Abelseth, Miss. Karen Marie&quot;</td><td>&quot;female&quot;</td><td>0.19833</td><td>0</td><td>0</td><td>&quot;348125&quot;</td><td>3</td><td>7.65</td><td>null</td><td>&quot;Southampton&quot;</td><td>1</td></tr><tr><td>5</td><td>&quot;Abelseth, Mr. Olaus Jorgensen&quot;</td><td>&quot;male&quot;</td><td>0.311064</td><td>0</td><td>0</td><td>&quot;348122&quot;</td><td>3</td><td>7.65</td><td>&quot;F G63&quot;</td><td>&quot;Southampton&quot;</td><td>1</td></tr><tr><td>6</td><td>&quot;Abelson, Mr. Samuel&quot;</td><td>&quot;male&quot;</td><td>0.373695</td><td>1</td><td>0</td><td>&quot;P/PP 3381&quot;</td><td>2</td><td>24.0</td><td>null</td><td>&quot;Cherbourg&quot;</td><td>0</td></tr><tr><td>7</td><td>&quot;Abelson, Mrs. Samuel (Hannah W…</td><td>&quot;female&quot;</td><td>0.348643</td><td>1</td><td>0</td><td>&quot;P/PP 3381&quot;</td><td>2</td><td>24.0</td><td>null</td><td>&quot;Cherbourg&quot;</td><td>1</td></tr><tr><td>8</td><td>&quot;Abrahamsson, Mr. Abraham Augus…</td><td>&quot;male&quot;</td><td>0.248434</td><td>0</td><td>0</td><td>&quot;SOTON/O2 3101284&quot;</td><td>3</td><td>7.925</td><td>null</td><td>&quot;Southampton&quot;</td><td>1</td></tr><tr><td>9</td><td>&quot;Abrahim, Mrs. Joseph (Sophie H…</td><td>&quot;female&quot;</td><td>0.223382</td><td>0</td><td>0</td><td>&quot;2657&quot;</td><td>3</td><td>7.2292</td><td>null</td><td>&quot;Cherbourg&quot;</td><td>1</td></tr></tbody></table></div>"
],
"text/plain": [
"+-----+-----------------------+--------+---------+---+----------+-------+---------------+----------+\n",
"| id | name | sex | age | … | fare | cabin | port_embarked | survived |\n",
"| --- | --- | --- | --- | | --- | --- | --- | --- |\n",
"| i64 | str | str | f64 | | f64 | str | str | i64 |\n",
"+==================================================================================================+\n",
"| 0 | Abbing, Mr. Anthony | male | 0.52401 | … | 7.55000 | null | Southampton | 0 |\n",
"| 1 | Abbott, Master. | male | 0.16075 | … | 20.25000 | null | Southampton | 0 |\n",
"| | Eugene Joseph | | | | | | | |\n",
"| 2 | Abbott, Mr. Rossmore | male | 0.19833 | … | 20.25000 | null | Southampton | 0 |\n",
"| | Edward | | | | | | | |\n",
"| 3 | Abbott, Mrs. Stanton | female | 0.43633 | … | 20.25000 | null | Southampton | 1 |\n",
"| | (Rosa Hun… | | | | | | | |\n",
"| 4 | Abelseth, Miss. Karen | female | 0.19833 | … | 7.65000 | null | Southampton | 1 |\n",
"| | Marie | | | | | | | |\n",
"| 5 | Abelseth, Mr. Olaus | male | 0.31106 | … | 7.65000 | F G63 | Southampton | 1 |\n",
"| | Jorgensen | | | | | | | |\n",
"| 6 | Abelson, Mr. Samuel | male | 0.37369 | … | 24.00000 | null | Cherbourg | 0 |\n",
"| 7 | Abelson, Mrs. Samuel | female | 0.34864 | … | 24.00000 | null | Cherbourg | 1 |\n",
"| | (Hannah W… | | | | | | | |\n",
"| 8 | Abrahamsson, Mr. | male | 0.24843 | … | 7.92500 | null | Southampton | 1 |\n",
"| | Abraham Augus… | | | | | | | |\n",
"| 9 | Abrahim, Mrs. Joseph | female | 0.22338 | … | 7.22920 | null | Cherbourg | 1 |\n",
"| | (Sophie H… | | | | | | | |\n",
"+-----+-----------------------+--------+---------+---+----------+-------+---------------+----------+"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"execution_count": null,
"source": [
"from safeds.data.tabular.transformation import RangeScaler\n",
"\n",
"scaler = RangeScaler(selector=\"age\", min_=0.0, max_=1.0).fit(titanic)\n",
"scaler = RangeScaler(selector=\"age\", min=0.0, max=1.0).fit(titanic)\n",
"scaler.transform(titanic_slice)"
]
},
Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ plugins:
show_signature: false
show_symbol_type_heading: true
show_symbol_type_toc: true
summary: true
- gen-files:
scripts:
- docs/reference/generate_reference_pages.py
Expand Down
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,8 @@ ignore = [
"FBT001",
# boolean-default-value-in-function-definition (we leave it to the call-site)
"FBT002",
# builtin-argument-shadowing (we want readable parameter names in our API)
"A002",
# builtin-attribute-shadowing (not an issue)
"A003",
# implicit-return (can add a return even though all cases are covered)
Expand Down
8 changes: 5 additions & 3 deletions src/safeds/_typing/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,23 @@

from safeds.data.tabular.containers import Cell

# Literals
_NumericLiteral: TypeAlias = int | float | Decimal
_TemporalLiteral: TypeAlias = datetime.date | datetime.time | datetime.datetime | datetime.timedelta
_PythonLiteral: TypeAlias = _NumericLiteral | bool | str | bytes | _TemporalLiteral

# Convertible to cell (we cannot restrict `Cell`, because `Row.get_cell` returns a `Cell[Any]`)
_ConvertibleToCell: TypeAlias = _PythonLiteral | Cell | None
_BooleanCell: TypeAlias = Cell[bool | None]
# We cannot restrict `Cell`, because `Row.get_cell` returns a `Cell[Any]`.
_ConvertibleToBooleanCell: TypeAlias = bool | Cell | None
_ConvertibleToIntCell: TypeAlias = int | Cell | None
_ConvertibleToStringCell: TypeAlias = str | Cell | None


__all__ = [
"_BooleanCell",
"_ConvertibleToBooleanCell",
"_ConvertibleToCell",
"_ConvertibleToIntCell",
"_ConvertibleToStringCell",
"_NumericLiteral",
"_PythonLiteral",
"_TemporalLiteral",
Expand Down
9 changes: 6 additions & 3 deletions src/safeds/_utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,23 +10,26 @@
from ._lazy import _safe_collect_lazy_frame, _safe_collect_lazy_frame_schema
from ._plotting import _figure_to_image
from ._random import _get_random_seed
from ._string import _get_similar_strings

apipkg.initpkg(
__name__,
{
"_compute_duplicates": "._collections:_compute_duplicates",
"_structural_hash": "._hashing:_structural_hash",
"_safe_collect_lazy_frame": "._lazy:_safe_collect_lazy_frame",
"_safe_collect_lazy_frame_schema": "._lazy:_safe_collect_lazy_frame_schema",
"_figure_to_image": "._plotting:_figure_to_image",
"_get_random_seed": "._random:_get_random_seed",
"_get_similar_strings": "._string:_get_similar_strings",
"_safe_collect_lazy_frame": "._lazy:_safe_collect_lazy_frame",
"_safe_collect_lazy_frame_schema": "._lazy:_safe_collect_lazy_frame_schema",
"_structural_hash": "._hashing:_structural_hash",
},
)

__all__ = [
"_compute_duplicates",
"_figure_to_image",
"_get_random_seed",
"_get_similar_strings",
"_safe_collect_lazy_frame",
"_safe_collect_lazy_frame_schema",
"_structural_hash",
Expand Down
17 changes: 17 additions & 0 deletions src/safeds/_utils/_string.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
from __future__ import annotations

from typing import TYPE_CHECKING

if TYPE_CHECKING:
from collections.abc import Iterable


def _get_similar_strings(string: str, valid_strings: Iterable[str]) -> list[str]:
from difflib import get_close_matches

close_matches = get_close_matches(string, valid_strings, n=3)

if close_matches and close_matches[0] == string:
return close_matches[0:1]
else:
return close_matches
6 changes: 6 additions & 0 deletions src/safeds/_validation/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@
from ._check_indices_module import _check_indices
from ._check_row_counts_are_equal_module import _check_row_counts_are_equal
from ._check_schema_module import _check_schema
from ._check_time_zone_module import _check_time_zone
from ._convert_and_check_datetime_format_module import _convert_and_check_datetime_format
from ._normalize_and_check_file_path_module import _normalize_and_check_file_path

apipkg.initpkg(
Expand All @@ -29,6 +31,8 @@
"_check_indices": "._check_indices_module:_check_indices",
"_check_row_counts_are_equal": "._check_row_counts_are_equal_module:_check_row_counts_are_equal",
"_check_schema": "._check_schema_module:_check_schema",
"_check_time_zone": "._check_time_zone_module:_check_time_zone",
"_convert_and_check_datetime_format": "._convert_and_check_datetime_format_module:_convert_and_check_datetime_format",
"_normalize_and_check_file_path": "._normalize_and_check_file_path_module:_normalize_and_check_file_path",
},
)
Expand All @@ -45,5 +49,7 @@
"_check_indices",
"_check_row_counts_are_equal",
"_check_schema",
"_check_time_zone",
"_convert_and_check_datetime_format",
"_normalize_and_check_file_path",
]
14 changes: 2 additions & 12 deletions src/safeds/_validation/_check_columns_exist_module.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

from typing import TYPE_CHECKING

from safeds._utils import _get_similar_strings
from safeds.exceptions import ColumnNotFoundError

if TYPE_CHECKING:
Expand Down Expand Up @@ -52,20 +53,9 @@ def _build_error_message(schema: Schema, unknown_names: list[str]) -> str:
result = "Could not find column(s):"

for unknown_name in unknown_names:
similar_columns = _get_similar_column_names(schema, unknown_name)
similar_columns = _get_similar_strings(unknown_name, schema.column_names)
result += f"\n - '{unknown_name}'"
if similar_columns:
result += f": Did you mean one of {similar_columns}?"

return result


def _get_similar_column_names(schema: Schema, name: str) -> list[str]:
from difflib import get_close_matches

close_matches = get_close_matches(name, schema.column_names, n=3)

if close_matches and close_matches[0] == name:
return close_matches[0:1]
else:
return close_matches
34 changes: 34 additions & 0 deletions src/safeds/_validation/_check_time_zone_module.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
import zoneinfo

from safeds._utils import _get_similar_strings

_VALID_TZ_IDENTIFIERS = zoneinfo.available_timezones()


def _check_time_zone(time_zone: str | None) -> None:
"""
Check if the time zone is valid.

Parameters
----------
time_zone:
The time zone to check.

Raises
------
ValueError
If the time zone is invalid.
"""
if time_zone is not None and time_zone not in _VALID_TZ_IDENTIFIERS:
message = _build_error_message(time_zone)
raise ValueError(message)


def _build_error_message(time_zone: str) -> str:
result = f"Invalid time zone '{time_zone}'."

similar_time_zones = _get_similar_strings(time_zone, _VALID_TZ_IDENTIFIERS)
if similar_time_zones: # pragma: no cover
result += f" Did you mean one of {similar_time_zones}?"

return result
Loading