Fine-tune LLMs to sound like you.
Customain learns your writing style from your own real text content, and conversations, and fine-tunes (large) language models to mimic your tone, voice, and communication patterns. The result is your custom AI that does not sound generic, but just like you.
Your emails → Extract & clean → Fine-tune → A model that writes like you
- Connect a content source (Gmail today, more coming)
- Process your text into high-quality, anonymized training pairs
- Fine-tune OpenAI models on your writing style
- Evaluate how well the model captures your tone — with both classical metrics and a trained authorship classifier
| Source | Status |
|---|---|
| Gmail | ✅ Available |
| Outlook | 🔜 Planned |
| Slack | 🔜 Planned |
| Notion | 🔜 Planned |
| Google Docs | 🔜 Planned |
| Provider | Models | Methods | Status |
|---|---|---|---|
| OpenAI | GPT-4.1, 4.1-mini, 4.1-nano, 4o, 4o-mini | SFT, DPO | ✅ Available |
| Together AI | Llama, Mixtral, Qwen + any HF model | -- | 🔜 Planned |
- Python 3.11+
- uv package manager
- OpenAI API key
- Gmail OAuth credentials (for Gmail source)
git clone https://github.com/user/customain.git
cd customain
uv syncCreate .secrets/api_keps.json:
{
"openai_api_key": "sk-...",
"wandb_api_key": "optional-for-tracking"
}For Gmail, you'll also need OAuth credentials — see Google's guide.
Run the full Gmail preprocessing pipeline:
uv run python -m gmail_preprocessing_pipeline.run_pipelineOr skip steps you've already completed:
# Already exported Gmail — start from extract
uv run python -m gmail_preprocessing_pipeline.run_pipeline --start-from 2
# Re-run just anonymize + format
uv run python -m gmail_preprocessing_pipeline.run_pipeline --start-from 5The pipeline runs 6 steps:
- Export Gmail threads to mbox
- Extract email-reply pairs
- Clean signatures, quotes, links (LLM)
- Filter low-quality pairs (LLM)
- Anonymize person names →
[NAME](LLM) - Format into SFT train/test split
email processing pipeline
Output:
data/sft_train.jsonlanddata/sft_test.jsonl
Configure which models and hyperparameters to try in ft/training_configs.py, then run the full pipeline:
uv run python -m ft.run_pipeline \
--train-file data/sft_train.jsonl \
--test-file data/sft_test.jsonlOr run a quick test with a small subset first:
uv run python -m ft.run_pipeline \
--train-file data/sft_train.jsonl \
--test-file data/sft_test.jsonl \
--test-runYou can also skip steps you've already completed:
# Skip data upload and job launch, just evaluate
uv run python -m ft.run_pipeline \
--train-file data/sft_train.jsonl \
--test-file data/sft_test.jsonl \
--skip 1 2The pipeline will:
- Upload data and launch fine-tuning jobs across your configured model/hyperparameter combinations
- Poll until all jobs complete
- Run each fine-tuned model on the test set
- Evaluate results and log metrics to Weights & Biases
Customain includes a pluggable evaluation framework. Evaluators are auto-discovered. You can just drop a new one into ft/evaluation/evaluators/. It can be ml-based, statistical, or any other form you prefer. Take a look at the existing ml-based and metric/statistical evaluators already implemented:
| Evaluator | What it measures |
|---|---|
authorship_classifier |
CNN-based authorship probability score |
tone_judge |
LLM-as-judge scoring tone & style fidelity |
bleu |
N-gram overlap (BLEU score) |
meteor |
Token-level alignment (METEOR score) |
semantic_similarity |
Embedding cosine similarity |
Configure which evaluators to skip in ft/training_configs.py:
skip_evaluators = ["bleu", "meteor"] # Only run tone_judge and semantic_similarityA character-level CNN text classifier trained to distinguish the author's writing from other people's writings. Unlike LLM-as-judge evaluators, this learns style patterns directly from data, hence it does not suffer from the LLM-as-a-judge performance issues. Its current best performance is 91% precision.
# Prepare training data from existing SFT data
uv run python -m classifiers.authorship.prepare_data
# Train (logs to W&B under customain-classifiers)
uv run python -m classifiers.authorship.train \
--train-data data/classifiers/authorship/train.jsonl \
--val-data data/classifiers/authorship/val.jsonl
# The authorship_classifier evaluator auto-registers and uses the trained checkpointThis project is licensed under the GNU Affero General Public License v3.0 (AGPLv3).