CPT-LoRA Fine-Tuning Scripts

This repository provides scripts and resources for fine-tuning CPT (Continued Pre-training) models using LoRA (Low-Rank Adaptation) methods, focusing on flexibility, efficiency, and reproducibility.

Repository Structure

The repository contains two main folders, each dedicated to a specific fine-tuning scenario:

1. `CPT-KwaiPilot`

This folder includes scripts and resources designed for fine-tuning CPT models in the context of the KwaiPilot adaptation or dataset. It may contain code for data handling, training, and evaluation tailored to this specific use case.

2. `CPT_Qwen-2.5-Coder-Instruct-32B`

This folder focuses on fine-tuning CPT models using the Qwen-2.5-Coder-Instruct-32B configuration or dataset. The scripts here are likely specialized for instruction-based or coder-centric training workflows.

3. `EulerDatasetGen`

This folder contains tools for downloading Haskell packages from Hackage and preparing them into training datasets for Large Language Models (LLMs). It provides automated package downloading, source code processing, and dataset preparation optimized for Continued Pre-Training (CPT) tasks with memory-efficient streaming processing.

Each folder contains its own scripts, configurations, and documentation for running experiments and training models.

Getting Started

Clone the repository:

git clone https://github.com/xynehq/code-trainer.git
cd code-trainer

Install dependencies:
Refer to the individual README files in each folder for environment setup and installation instructions.
Run a script:
Scripts can be executed from their respective folders. Please consult the folder-specific documentation for details.

Contribution

Contributions, suggestions, and feature requests are welcome! Feel free to open issues or submit pull requests to improve this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
CPT-GLM-4.5-AIR-QLoRA		CPT-GLM-4.5-AIR-QLoRA
CPT-KAT-Dev-72B-Exp-LoRA-ResourceSaverVariant		CPT-KAT-Dev-72B-Exp-LoRA-ResourceSaverVariant
CPT-KAT-Dev-72B-Exp-LoRA		CPT-KAT-Dev-72B-Exp-LoRA
CPT-KwaiPilot-KAT-Dev-LoRA-FFN-Block		CPT-KwaiPilot-KAT-Dev-LoRA-FFN-Block
CPT-KwaiPilot-KAT-Dev		CPT-KwaiPilot-KAT-Dev
CPT-LoRA-KwaiPilot-KAT-Dev		CPT-LoRA-KwaiPilot-KAT-Dev
CPT-LoRA-Qwen-2.5-Coder-Instruct-32B		CPT-LoRA-Qwen-2.5-Coder-Instruct-32B
CPT_Qwen-2.5-Coder-Instruct-32B		CPT_Qwen-2.5-Coder-Instruct-32B
Dataset Generation		Dataset Generation
EulerDatasetGen		EulerDatasetGen
MoE_QLoRA		MoE_QLoRA
Multi-Node-FSDP		Multi-Node-FSDP
Qwen-MoE-LoRA-FineTuning-SPECIAL		Qwen-MoE-LoRA-FineTuning-SPECIAL
Qwen-MoE-LoRA-FineTuning-SPECIAL_v2		Qwen-MoE-LoRA-FineTuning-SPECIAL_v2
Qwen-MoE-LoRA-FineTuning-SPECIAL_v3		Qwen-MoE-LoRA-FineTuning-SPECIAL_v3
TPU_FineTuning_XLA		TPU_FineTuning_XLA
curriculum-datasetGen		curriculum-datasetGen
worked_for_large_MoE		worked_for_large_MoE
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CPT-LoRA Fine-Tuning Scripts

Repository Structure

1. `CPT-KwaiPilot`

2. `CPT_Qwen-2.5-Coder-Instruct-32B`

3. `EulerDatasetGen`

Getting Started

Contribution

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

License

xynehq/code-trainer

Folders and files

Latest commit

History

Repository files navigation

CPT-LoRA Fine-Tuning Scripts

Repository Structure

1. CPT-KwaiPilot

2. CPT_Qwen-2.5-Coder-Instruct-32B

3. EulerDatasetGen

Getting Started

Contribution

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

1. `CPT-KwaiPilot`

2. `CPT_Qwen-2.5-Coder-Instruct-32B`

3. `EulerDatasetGen`

Packages