-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
The split_tt function in utils.py uses Series.iteritems(), which was deprecated in pandas 1.5.0 and removed in pandas 2.0.0.
Since pyproject.toml does not define an upper bound for the pandas dependency, fresh installations pull pandas 2.x, resulting in the following error:
AttributeError: 'Series' object has no attribute 'iteritems'
This causes a complete crash when using dataset splitting or model evaluation utilities.
Steps to Reproduce
pip install trapiche # installs pandas>=2.0import pandas as pd
from trapiche.utils import split_tt
df = pd.DataFrame({
'SAMPLE_ID': ['s1','s2'],
'project': ['p1','p1'],
'lineage': ['root:A:B','root:A:C'],
'max_depth': [3,3]
})
split_tt(df, 0.2, 42, 'root:A')Expected Behavior
The function executes successfully and returns a DataFrame with an IS_TEST column.
Actual Behavior
AttributeError: 'Series' object has no attribute 'iteritems'
Impact
- Dataset splitting fails completely
- Model retraining and evaluation workflows are blocked
- Fresh installs are unusable with pandas 2.x
Proposed Fix
Replace:
.iteritems()with:
.items().items() is supported in all pandas versions ≥1.0 and works as a direct replacement.
I’d be happy to submit a PR with this fix if it aligns with your expectations.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels