Skip to content

Conversation

@FelipeAdachi
Copy link
Contributor

@FelipeAdachi FelipeAdachi commented Nov 21, 2023

Currently, if the user wants to use Langkit for a Feature Extraction scenario, they would neet to run:

import toxicity
from whylogs.experimental.core.udf_schema import udf_schema
import pandas as pd

df = pd.DataFrame({"prompt": ["I love you", "I hate you"]})
schema = udf_schema()

df_enhanced, _ = schema.apply_udfs(df)

Which unnecessarily exposes the user to whylogs' udf_schema and provides a confusing tuple output.

This PR wraps the code above into a langkit.extract function, so it becomes like this:

import langkit
from langkit import toxicity

df = pd.DataFrame({"prompt": ["I love you", "I hate you"]})
enhanced_df = langkit.extract(data=df)

or, for the row case:

import langkit
from langkit import toxicity

row = {"prompt": "I love you", "response": "I hate you"}
enhanced_row = langkit.extract(data=row)

also:

  • incidental error handling in hallucination module

Copy link
Collaborator

@jamie256 jamie256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @FelipeAdachi, thanks!

@jamie256 jamie256 merged commit 23497fa into main Nov 27, 2023
@jamie256 jamie256 deleted the dev/felipe/extract branch November 27, 2023 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants