Skip to content

Series.iteritems() removed in pandas 2.x causing crash in split_tt #7

@bleedblack1

Description

@bleedblack1

The split_tt function in utils.py uses Series.iteritems(), which was deprecated in pandas 1.5.0 and removed in pandas 2.0.0.

Since pyproject.toml does not define an upper bound for the pandas dependency, fresh installations pull pandas 2.x, resulting in the following error:

AttributeError: 'Series' object has no attribute 'iteritems'

This causes a complete crash when using dataset splitting or model evaluation utilities.


Steps to Reproduce

pip install trapiche  # installs pandas>=2.0
import pandas as pd
from trapiche.utils import split_tt

df = pd.DataFrame({
    'SAMPLE_ID': ['s1','s2'],
    'project': ['p1','p1'],
    'lineage': ['root:A:B','root:A:C'],
    'max_depth': [3,3]
})

split_tt(df, 0.2, 42, 'root:A')

Expected Behavior

The function executes successfully and returns a DataFrame with an IS_TEST column.


Actual Behavior

AttributeError: 'Series' object has no attribute 'iteritems'

Impact

  • Dataset splitting fails completely
  • Model retraining and evaluation workflows are blocked
  • Fresh installs are unusable with pandas 2.x

Proposed Fix

Replace:

.iteritems()

with:

.items()

.items() is supported in all pandas versions ≥1.0 and works as a direct replacement.


I’d be happy to submit a PR with this fix if it aligns with your expectations.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions