Skip to content

MP-003: Schema validator#4

Open
jayyaali95 wants to merge 13 commits intomainfrom
jayyaali/mp-003
Open

MP-003: Schema validator#4
jayyaali95 wants to merge 13 commits intomainfrom
jayyaali/mp-003

Conversation

@jayyaali95
Copy link
Copy Markdown
Owner

Implemented a schema validator using two approaches: one based on a metaclass, and the other leveraging the init_subclass method.

@jayyaali95 jayyaali95 requested a review from viktorb-lyft June 2, 2025 00:25
import pandas as pd


class Column:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be a dataclass

_allow_extra_columns = False

meta_class_based_df = DataFrame({'days': [1,2,3], 'probability': [0.1, 0.5, 0.9], 'feature': ['a','b','c'], 'extra': [1,2,3]}, schema=TestSchemaMetaClassBase)
init_subclass_based_df = DataFrame({'days': [1,2,3], 'probability': [0.1, 0.5, 0.9], 'feature': ['a','b','c'], 'extra': [1,2,3]}, schema=TestSchemaInitSubclassBase)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a slightly different approach than what I had in mind. You are passing the scheme to the dataframe constructor, while it works for validating the dataframe itself, we cannot change the schema of the dataframe. e.g:

df = DataFrame()

helper_function1(DataFrame[Schema1])
helper_function2(DataFrame[Schema2])

Schema1 and schema2 can be completely different, they descirbe the assumptions helper funciton 1 and 2 are making on the dataframe. We want to be able to validate the same dataframe for both schemas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants