noisyllm

Toolkit for robust LLM fine-tuning on noisy training data.

Companion open-source implementation for the paper:

Fine-Tuning LLMs for Robust Classification in Noisy Data Environments Journal of Information Systems Engineering and Management (JISEM), 2024, 9(1) e-ISSN: 2468-4376

Overview

Real-world training data is rarely clean. Mislabeled examples, inconsistent annotations, and sparse data are common in production environments such as e-commerce search, financial services, and customer support. This library provides the tooling to detect noise in classification datasets, clean and augment training data, evaluate model robustness across noise types and levels, and reproduce the benchmark experiments from the paper.

Modules

Module	Purpose
`noisyllm.detect`	Cross-validation confidence scoring to flag likely mislabeled examples
`noisyllm.clean`	Filter high-confidence noise with configurable thresholds
`noisyllm.train`	Pydantic configs for noise-robust training (label smoothing, curriculum learning)
`noisyllm.eval`	Robustness evaluator across noise types and levels
`noisyllm.benchmark`	Synthetic noisy benchmark datasets for reproducible comparison

Installation

pip install noisyllm

Or with UV:

uv add noisyllm

Quick Start

from noisyllm.detect import NoiseProfiler
from noisyllm.clean import DataCleaner
from noisyllm.eval import RobustnessEvaluator

# Detect noise in a labeled dataset
profiler = NoiseProfiler(text_col="text", label_col="label", confidence_threshold=0.3)
report = profiler.analyze(dataset)
print(report.summary())
# Dataset: 5000 samples
# Estimated noise rate: 8.1%
# Flagged samples: 407

# Clean the dataset by removing high-confidence mislabels
cleaner = DataCleaner(noise_report=report, filter_threshold=0.7)
result = cleaner.clean(dataset)
print(result.summary())
# Original: 5000 | Filtered: 312 | Final: 4688

# Evaluate robustness of a trained classifier
evaluator = RobustnessEvaluator(predict_fn=model.predict)
eval_report = evaluator.evaluate(
    clean_test=test_dataset,
    noise_levels=[0.05, 0.10, 0.15, 0.20],
    noise_types=["label_flip", "text_corruption"],
)
print(eval_report.summary())
# Base accuracy (clean): 94.2%
# Robustness index: 0.91

Benchmarks

from noisyllm.benchmark import load_benchmark

dataset = load_benchmark("intent_classification", noise_level=0.10)
# Returns: BenchmarkDataset with train (noisy), test (clean), label_set

Available benchmarks: intent_classification, sentiment, document

Development

uv sync --all-extras
uv run pytest tests/ -v --cov=src
uv run isort src/ tests/ && uv run black src/ tests/

Citation

If you use this library in your research, please cite the paper:

Fine-Tuning LLMs for Robust Classification in Noisy Data Environments.
Journal of Information Systems Engineering and Management (JISEM), 2024, 9(1).
e-ISSN: 2468-4376

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
src/noisyllm		src/noisyllm
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

noisyllm

Overview

Modules

Installation

Quick Start

Benchmarks

Development

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

noisyllm

Overview

Modules

Installation

Quick Start

Benchmarks

Development

Citation

License

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages