-
Notifications
You must be signed in to change notification settings - Fork 0
Update arch overview #119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update arch overview #119
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -7,28 +7,56 @@ The diagram below illustrates the architecture of the Alauda AI platform. | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
|  | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| NOTE: Alauda AI uses some general Kubernetes, ACP components including: | ||||||||||||||||||||||||||||||||||||||||||
| ## Component Description | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| * ALB | ||||||||||||||||||||||||||||||||||||||||||
| * Erebus | ||||||||||||||||||||||||||||||||||||||||||
| * kube-apiserver (kubernetes component) | ||||||||||||||||||||||||||||||||||||||||||
| ### Components in Alauda Container Platform Layer | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| | Component | Description | Type | License | | ||||||||||||||||||||||||||||||||||||||||||
| | --- | --- | --- | --- | | ||||||||||||||||||||||||||||||||||||||||||
| | Lich | Alauda AI UI console | Self-developed | | | ||||||||||||||||||||||||||||||||||||||||||
| | aml-operator | Manages installation and life cycles of Alauda AI components | Self-developed | | | ||||||||||||||||||||||||||||||||||||||||||
| | aml-apiserver | Extends kubernetes api-server and provide authorization enhancements for Alauda AI API access | Self-developed | | | ||||||||||||||||||||||||||||||||||||||||||
| | skipper & oauth2-proxy | Proxies traffic from the global cluster to workload clusters. Traffic is authenticated by oauth2-proxy | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | aml-controller | Manages Alauda AI namespaces on workload clusters. Namespaces will be automatically configured a Model Repo space and corresponding resources. | Self-developed | | | ||||||||||||||||||||||||||||||||||||||||||
| | aml-api-deploy | Provides high-level APIs for "Lich" | Self-developed | | | ||||||||||||||||||||||||||||||||||||||||||
| | Gitlab (with Minio or S3) | Model repository backend storage and version tracking. | Open source | MIT | | ||||||||||||||||||||||||||||||||||||||||||
| | kserve-controller | (Optionally with knative serving enabled) Manages AI inference services and inference service runtimes. | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | workspace-controller | Manages workbench instances (jupyter notebooks, codeserver) | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | Volcano | Plugin to provide co-scheduling (gang-scheduling) features for AI training jobs. Also manages "volcanojob" resource to run general training workloads. | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | MLFlow | Track training, evaluation jobs by storing, visualizing metrics and artifacts | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | Fine Tuning | Experimental UI providing no-code LLM fine tunning job creation and management | Self-developed | | | ||||||||||||||||||||||||||||||||||||||||||
| | Kubeflow | Open source plugin providing MLOps features including: Notebooks, Tensorboard, Kubeflow pipeline, training operator. | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | Label Studio | Open source plugin for dataset labeling | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | Dify | Open source plugin for creating LLM Agents, RAG applications using a web UI | Open source | ```<br>a modified version of the Apache License 2.0<br>``` | | ||||||||||||||||||||||||||||||||||||||||||
| | Evidently | Open source plugin for monitoring online inference service performance and data drifts | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | GPU device plugins | HAMi and nvidia gpu device plugin | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | GPU (Alauda Build of Nvidia GPU Device Plugin) | Provides GPU resources for AI workloads | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | HAMi (Alauda Build of Hami, Alauda Build of Hami-WebUI) | GPU resource slicing, sharing and scheduling | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | Alauda Build of DCGM-Exporter | GPU monitoring | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | Alauda Build of NPU Operator | Provides NPU resources for AI workloads | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | Alauda Build of Node Feature Discovery | Detects hardware features of cluster nodes | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | DRA (Alauda build of NVIDIA DRA Driver for GPUs) | Dynamic Resource Allocation for GPU sharing | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | Volcano (Alauda support for Volcano) | Batch job scheduling for AI workloads | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | Kueue (Alauda Build of Kueue) | Job scheduling for AI workloads | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | Milvus (Alauda Build of Milvus) | Vector database for embedding storage and retrieval | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | PGVector (Alauda support for PostgreSQL) | PostgreSQL extension for vector similarity search | Open source | The PostgreSQL License | | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| ### Components in AI Platform Layer | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| | Component | Description | Type | License | | ||||||||||||||||||||||||||||||||||||||||||
| | --- | --- | --- | --- | | ||||||||||||||||||||||||||||||||||||||||||
| | Model Catalog (Alauda AI/Alauda AI Essentials) | Centralized repository for managing AI models and their metadata | Proprietary | Commercial | | ||||||||||||||||||||||||||||||||||||||||||
| | Model Registry (Alauda support for Kubeflow Model Registry) | Keep track of AI model versions and metadata for each namespace | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | Datasets (Alauda AI/Alauda AI Essentials) | Centralized repository for managing datasets and their metadata | Proprietary | Commercial | | ||||||||||||||||||||||||||||||||||||||||||
| | Labeling (Alauda support for Label Studio) | Data labeling tool for creating labeled datasets | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | Feature Store (Alauda support for FeatureForm) | Centralized repository for managing and serving machine learning features | Open source | Mozilla Public License (MPL) | | ||||||||||||||||||||||||||||||||||||||||||
| | Workbench (Alauda AI Workbench) | Web-based interface for managing AI projects, including model training and inference | Proprietary | Commercial | | ||||||||||||||||||||||||||||||||||||||||||
| | Training Jobs (Alauda support for Kubeflow Trainer v2) | Kubernetes-native training job management | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | Kubeflow Pipelines (Alauda support for Kubeflow Base & Alauda support for Kubeflow Pipeline) | Workflow orchestration for AI pipelines | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | Guardrails (Coming soon) | AI safety and governance framework | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | Drift & Bias Detection (Alauda support for Evidently) | Monitoring for model performance degradation and bias | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | Experiment Tracking (Alauda support for MLFlow) | Tracking and comparing machine learning experiments | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| ### Components in GenAI Platform Layer | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| | Component | Description | Type | License | | ||||||||||||||||||||||||||||||||||||||||||
| | --- | --- | --- | --- | | ||||||||||||||||||||||||||||||||||||||||||
| | Kserve (Alauda AI Model Serving/Alauda Generative AI) | Kubernetes-native model serving framework | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | vLLM (Alauda AI Model Serving/Alauda Generative AI) | High-performance model inference engine for large language models | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | llm-d (Alauda Generative AI) | Distributed inference engine for large language models | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | Model as a Service (Alauda build of Envoy AI Gateway) | API gateway for serving AI models as a service | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
| | Fine-tuning | Tools integrated with the workbench for fine-tuning large language models, e.g. transformers, accelerate, llama-factory etc. | Open source | - | | ||||||||||||||||||||||||||||||||||||||||||
| | Training (Alauda support for Kubeflow Trainer v2) | Kubernetes-native training job management | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Potential duplicate of "Training Jobs" entry. Line 54 "Training (Alauda support for Kubeflow Trainer v2)" appears to duplicate Line 38 "Training Jobs (Alauda support for Kubeflow Trainer v2)" with an identical description. Consider removing this duplicate or clarifying how they differ. Suggested fix-| Training (Alauda support for Kubeflow Trainer v2) | Kubernetes-native training job management | Open source | Apache Version 2.0 |Or if they serve different purposes, update the descriptions to differentiate them. 📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||||||||||
| | Model Quantization | Tools integrated with the workbench for model quantization, e.g. llm-compressor etc. | Open source | - | | ||||||||||||||||||||||||||||||||||||||||||
| | Evaluation | Tools integrated with the workbench for evaluating model performance, e.g. lm-evaluation-harness etc. | Open source | - | | ||||||||||||||||||||||||||||||||||||||||||
| | Llama Stack (Alauda build of Llama Stack) | Framework for building applications with large language models | Open source | MIT | | ||||||||||||||||||||||||||||||||||||||||||
| | Langchain | Tools integrated with the workbench for building LLM applications using Langchain | Open source | MIT | | ||||||||||||||||||||||||||||||||||||||||||
| | Dify (Alauda support for Dify) | Platform for building AI assistants and chatbots | Open source | Apache Version 2.0 (modified) | | ||||||||||||||||||||||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🌐 Web query:
💡 Result:
Apache-2.0 “modified terms” / additional conditions (from the official
|
||||||||||||||||||||||||||||||||||||||||||
| | MCP Servers | Can integrate with various MCP servers | - | - | | ||||||||||||||||||||||||||||||||||||||||||
| | Agent Tracing (Alauda support for MLflow) | Tracing and monitoring for AI agents | Open source | Apache Version 2.0 | | ||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+42
to
+61
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use consistent Line 41 uses 🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||||||||||
| | Agent Evaluation | Tools integrated with the workbench for evaluating AI agents, e.g. RAGAS etc. | Open source | - | | ||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+53
to
+62
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Replace open-source license placeholders with explicit values. On Lines 52-58 and Line 61, Suggested table fix pattern-| Fine-tuning | Tools integrated with the workbench for fine-tuning large language models, e.g. transformers, accelerate, llama-factory etc. | Open source | - |
+| Fine-tuning | Tools integrated with the workbench for fine-tuning large language models, e.g. transformers, accelerate, llama-factory etc. | Open source | Multiple/Varies (see component docs) |
-| Model Quantization | Tools integrated with the workbench for model quantization, e.g. llm-compressor etc. | Open source | - |
+| Model Quantization | Tools integrated with the workbench for model quantization, e.g. llm-compressor etc. | Open source | Multiple/Varies (see component docs) |📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||||||||||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
按全量写还是列举一些主要的?还少一些,比如PG Vector(图里有)、LWS(图里没有)、etc