GitHub - dtoth/kube-transform: A lightweight open-source framework for writing and deploying distributed data transformations on Kubernetes.

kube-transform

kube-transform is a lightweight open-source framework for writing and deploying distributed, batch-oriented data pipelines on Kubernetes.

It is intentionally thin — only ~1,000 lines of code — and designed to be an easy-to-understand layer between your code and Kubernetes. Unlike heavyweight orchestration platforms like Apache Airflow (which has hundreds of thousands of lines), kube-transform aims to be simple to reason about, debug, and extend.

It focuses on simplicity and flexibility:

Minimal required configuration
No persistent control plane components
Vendor-agnostic: compatible with any Kubernetes cluster, image, and file store that meet a few basic requirements

If you're looking for a quick way to get started, check out kube-transform-starter-kit for reusable setup resources like Dockerfiles, Terraform, and RBAC templates. But using the starter kit is entirely optional.

Requirements

Your setup must meet these basic requirements:

1. Deployment Inputs

To run a pipeline, you must provide:

pipeline_spec: A Python dictionary that conforms to the KTPipeline schema. You can optionally write your spec using the KTPipeline Pydantic model directly, and then call .model_dump() to convert it to a dict.
image_path: A string path to your Docker image
data_dir: A string path to your file store (must be valid from the perspective of pods running in the cluster)

2. Docker Image

Must include Python 3.11+
Must have kube-transform installed (e.g. via pip)
Must include your code in /app/kt_functions/, which should be an importable module containing the functions referenced in your pipeline

3. File Store ("DATA_DIR")

This is the directory that all pipeline jobs and the controller will read from and write to. It will be passed as the data_dir argument to run_pipeline().
Can be a local folder (e.g. /mnt/data) or a cloud object store (e.g. s3://some-bucket/)
Must be readable and writable by all pods in your cluster
Compatible with anything fsspec supports

The KT controller internally uses fsspec.open() to write pipeline metadata to DATA_DIR. You must ensure that DATA_DIR is transparently accessible to all pods (including the controller). This typically means one of the following:

You're using a mounted volume that is accessible at /mnt/data

You've configured access via IRSA (for S3) or Workload Identity (for GCS), so the kt-pod service account has permissions to access your object store

In a single-node cluster, simply mounting a local folder to /mnt/data will work. All KT pods will have access to /mnt/data.

4. Kubernetes Cluster

Must be able to pull your Docker image (e.g. via ECR, DockerHub, etc.)
Must be able to access your file store
Must include a service account named kt-pod in the default namespace, with permission to create Kubernetes Jobs
Your deployment machine must be able to connect to the cluster (e.g. via kubectl)

For a working example setup with autoscaling, IAM roles, and RBAC configuration, see kube-transform-starter-kit.

How to Run

Once your inputs are ready:

from kube_transform import run_pipeline

run_pipeline(pipeline_spec, image_path, data_dir)

This will:

Launch a temporary kt-controller Job in your Kubernetes cluster
Submit all pipeline jobs in the correct dependency order
Shut down automatically when the pipeline completes (or fails)

You can view progress using:

kubectl get pods

Make sure you have the same version of kube-transform running locally (for the run_pipeline function) as you have in your image.

Autoscaling & Resource Management

kube-transform is compatible with both fixed-size and autoscaling clusters:

If your cluster supports autoscaling, KT will take advantage of it automatically.
If your cluster is fixed-size, jobs will remain in Pending state until resources are available.

For help setting up either configuration, see kube-transform-starter-kit.

Questions? Feature requests? Open an issue on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github/workflows		.github/workflows
kube_transform		kube_transform
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kube-transform

Requirements

1. Deployment Inputs

2. Docker Image

3. File Store ("DATA_DIR")

4. Kubernetes Cluster

How to Run

Autoscaling & Resource Management

About

Uh oh!

Releases

Uh oh!

Languages

License

dtoth/kube-transform

Folders and files

Latest commit

History

Repository files navigation

kube-transform

Requirements

1. Deployment Inputs

2. Docker Image

3. File Store ("DATA_DIR")

4. Kubernetes Cluster

How to Run

Autoscaling & Resource Management

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Languages