-
Notifications
You must be signed in to change notification settings - Fork 0
add llama-stack introduciton and simple usage #111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| --- | ||
| weight: 82 | ||
| --- | ||
| # Llama Stack | ||
|
|
||
| <Overview /> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,76 @@ | ||
| --- | ||
| weight: 20 | ||
| --- | ||
|
|
||
| # Install Llama Stack | ||
|
|
||
| This document describes how to install and deploy Llama Stack Server on Kubernetes using the Llama Stack Operator. | ||
|
|
||
| ## Upload Operator | ||
|
|
||
| Download the Llama Stack Operator installation file (e.g., `llama-stack-operator.alpha.ALL.v0.7.0.tgz`). | ||
|
|
||
| Use the violet command to publish to the platform repository: | ||
|
|
||
| ```bash | ||
| violet push --platform-address=platform-access-address --platform-username=platform-admin --platform-password=platform-admin-password llama-stack-operator.alpha.ALL.v0.7.0.tgz | ||
| ``` | ||
|
|
||
| ## Install Operator | ||
|
|
||
| 1. Go to the `Administrator` view in the Alauda Container Platform. | ||
|
|
||
| 2. In the left navigation, select `Marketplace` / `Operator Hub`. | ||
|
|
||
| 3. In the right panel, find `Alauda build of Llama Stack` and click `Install`. | ||
|
|
||
| 4. Keep all parameters as default and complete the installation. | ||
|
|
||
| ## Deploy Llama Stack Server | ||
|
|
||
| After the operator is installed, deploy Llama Stack Server by creating a `LlamaStackDistribution` custom resource: | ||
|
|
||
| > **Note:** Prepare the following in advance; otherwise the distribution may not become ready: | ||
| > - **Secret**: Create a Secret (e.g., `deepseek-api`) in the same namespace with the LLM API token. Example: `kubectl create secret generic deepseek-api -n default --from-literal=token=<LLM_API_KEY>`. | ||
| > - **Storage Class**: Ensure the `default` Storage Class exists in the cluster; otherwise the PVC cannot be bound and the resource will not become ready. | ||
|
|
||
| ```yaml | ||
| apiVersion: llamastack.io/v1alpha1 | ||
| kind: LlamaStackDistribution | ||
| metadata: | ||
| annotations: | ||
| cpaas.io/display-name: "" | ||
| name: demo | ||
| namespace: default | ||
| spec: | ||
| network: | ||
| exposeRoute: false # Whether to expose the route externally | ||
| replicas: 1 # Number of server replicas | ||
| server: | ||
| containerSpec: | ||
| env: | ||
| - name: VLLM_URL | ||
| value: "https://api.deepseek.com/v1" # URL of the LLM API provider | ||
| - name: VLLM_MAX_TOKENS | ||
| value: "8192" # Maximum output tokens | ||
| - name: VLLM_API_TOKEN # Load LLM API token from secret | ||
| valueFrom: | ||
| secretKeyRef: # Create this Secret in the same namespace beforehand, e.g. kubectl create secret generic deepseek-api -n default --from-literal=token=<LLM_API_KEY> | ||
| key: token | ||
| name: deepseek-api | ||
| name: llama-stack | ||
| port: 8321 | ||
| distribution: | ||
| name: starter # Distribution name (options: starter, postgres-demo, meta-reference-gpu) | ||
| storage: | ||
| mountPath: /home/lls/.lls | ||
| size: 20Gi # Requires the "default" Storage Class to be configured beforehand | ||
| ``` | ||
|
|
||
| After deployment, the Llama Stack Server will be available within the cluster. The access URL is displayed in `status.serviceURL`, for example: | ||
|
|
||
| ```yaml | ||
| status: | ||
| phase: Ready | ||
| serviceURL: http://demo-service.default.svc.cluster.local:8321 | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| --- | ||
| weight: 20 | ||
| --- | ||
|
|
||
| # Main Features | ||
|
|
||
| ## Server-Based Architecture | ||
|
|
||
| - **Centralized Server**: Llama Stack Server hosts inference, agents, safety, tool runtime, vector I/O, and files | ||
| - **Remote or Inline Providers**: Support for remote APIs (e.g., OpenAI-compatible) and inline providers (e.g., meta-reference, sqlite-vec, localfs) | ||
| - **Kubernetes Deployment**: Deploy via Llama Stack Operator using `LlamaStackDistribution` custom resources | ||
|
|
||
| ## AI Agents with Tools | ||
|
|
||
| - **Agent Creation**: Create agents with model, instructions, and a list of tools | ||
| - **Client-Side Tools**: Define tools with the `@client_tool` decorator; the client executes tool calls and returns results to the server | ||
| - **Session Management**: Create sessions and run multi-turn conversations with streaming responses | ||
| - **Streaming**: Support for streaming agent responses for real-time display | ||
|
|
||
| ## Configuration and Extensibility | ||
|
|
||
| - **Stack Configuration**: YAML-based configuration for APIs, providers, persistence (e.g., kv_default, sql_default), and models | ||
| - **Environment Fallbacks**: Use `${env.VAR:~default}` in config for flexible deployment | ||
| - **Multiple Distributions**: Starter, postgres-demo, meta-reference-gpu and other distribution options | ||
|
|
||
| ## Integration | ||
|
|
||
| - **Python Client**: `llama-stack-client` for Python 3.12+ with full agent and model APIs | ||
| - **REST-Friendly**: Server exposes APIs for inference, agents, and tool runtime; can be wrapped in FastAPI or other web frameworks for production use |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| --- | ||
| weight: 10 | ||
| --- | ||
|
|
||
| # Overview | ||
|
|
||
| <Overview /> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| --- | ||
| weight: 10 | ||
| --- | ||
| # Introduction | ||
|
|
||
| ## Llama Stack | ||
|
|
||
| *Llama Stack* is a framework for building and running AI agents with tools. It provides a server-based architecture that enables developers to create agents that can interact with users, access external tools, and perform complex reasoning tasks. | ||
|
|
||
| Main components and concepts include: | ||
|
|
||
| - **Llama Stack Server**: Central service that hosts models, agents, and tool runtime. It can be deployed on Kubernetes via the Llama Stack Operator (see [Install Llama Stack](/en/llama_stack/install)). | ||
| - **Client SDK** (`llama-stack-client`): Python client for connecting to the server, creating agents, defining tools with the `@client_tool` decorator, and managing sessions. | ||
| - **Agents**: Configurable AI agents that use LLM models and can call tools (e.g., weather API, custom APIs) to answer user queries. | ||
| - **Tools**: Functions exposed to the agent (e.g., weather query). Defined with `@client_tool` and passed to the agent at creation time. | ||
| - **Configuration**: YAML stack configuration defines providers (inference, agents, safety, vector_io, files), persistence backends, and model registration (e.g., DeepSeek via OpenAI-compatible API). | ||
|
|
||
| Llama Stack supports multiple API providers, storage and persistence backends, and distribution options (e.g., starter, postgres-demo, meta-reference-gpu), making it suitable for quick experiments and production deployments. | ||
|
|
||
| ## Documentation | ||
|
|
||
| Llama Stack provides official documentation and resources for in-depth usage: | ||
|
|
||
| ### Official Documentation | ||
| - **Main Documentation**: [https://llamastack.github.io/docs](https://llamastack.github.io/docs) | ||
| - Usage, API providers, and core concepts | ||
| - **Core Concepts**: [https://llamastack.github.io/docs/concepts](https://llamastack.github.io/docs/concepts) | ||
| - Architecture, API stability, and resource management |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,78 @@ | ||
| --- | ||
| weight: 30 | ||
| --- | ||
|
|
||
| # Quickstart | ||
|
|
||
| This section provides a quickstart example for creating an AI Agent with Llama Stack. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - Python 3.12 or higher (if not satisfied, refer to [FAQ: How to prepare Python 3.12 in Notebook](#how-to-prepare-python-312-in-notebook)) | ||
| - Llama Stack Server installed and running via Operator (see [Install Llama Stack](./install)) | ||
| - Access to a Notebook environment (e.g., Jupyter Notebook, JupyterLab) | ||
| - Python environment with `llama-stack-client` and required dependencies installed | ||
| - API key for the LLM provider (e.g., DeepSeek API key) | ||
|
|
||
| ## Quickstart Example | ||
|
|
||
| A simple example of creating an AI Agent with Llama Stack is available in the following resources: | ||
|
|
||
| - **Notebook**:[Llama Stack Quick Start Demo](/llama-stack/llama-stack_quickstart.ipynb) | ||
|
|
||
| Download the notebook and upload it to a Notebook environment to run. | ||
|
|
||
| The notebook demonstrates: | ||
|
|
||
| - Connecting to Llama Stack Server and client setup | ||
| - Tool definition using the `@client_tool` decorator (weather query tool example) | ||
| - Client connection to Llama Stack Server | ||
| - Model selection and Agent creation with tools and instructions | ||
| - Agent execution with session management and streaming responses | ||
| - Result handling and display | ||
| - Optional FastAPI deployment example | ||
|
|
||
| ## FAQ | ||
|
|
||
| ### How to prepare Python 3.12 in Notebook | ||
|
|
||
| 1. Download the pre-compiled Python installation package: | ||
|
|
||
| ```bash | ||
| wget -O /tmp/python312.tar.gz https://github.com/astral-sh/python-build-standalone/releases/download/20260114/cpython-3.12.12+20260114-x86_64-unknown-linux-gnu-install_only.tar.gz | ||
| ``` | ||
|
|
||
| 2. Extract with: | ||
|
|
||
| ```bash | ||
| mkdir -p ~/python312 | ||
| tar -xzf /tmp/python312.tar.gz -C ~/python312 --strip-components=1 | ||
| ``` | ||
|
|
||
| 3. Install and Register Kernel: | ||
|
|
||
| ```bash | ||
| export PATH="${HOME}/python312/bin:${PATH}" | ||
|
|
||
| python3 -m pip install ipykernel | ||
| python3 -m ipykernel install --user --name python312 --display-name "Python 3.12" | ||
| ``` | ||
|
|
||
| 4. Switch kernel in the notebook page: | ||
|
|
||
| - Open your Notebook environment (e.g., Jupyter Notebook or JupyterLab) in the browser, then open an existing notebook or create a new one. | ||
| - In the notebook interface, find the current kernel name (usually shown in the **top-right corner** of the page, e.g., "Python 3" or "python3"). | ||
| - Click that kernel name, or use the menu **Kernel → Change Kernel**. | ||
| - In the kernel list, select **"Python 3.12"** (the display name registered in step 3). | ||
| - After switching, new cells will run with Python 3.12. | ||
|
|
||
| **Note**: When executing python and pip commands directly in the notebook page, the default python will still be used. You need to specify the full path to use the python312 version commands. | ||
|
|
||
| ## Additional Resources | ||
|
|
||
| For more resources on developing AI Agents with Llama Stack, see: | ||
|
|
||
| - [Llama Stack Documentation](https://llamastack.github.io/docs) - The official Llama Stack documentation covering all usage-related topics, API providers, and core concepts. | ||
| - [Llama Stack Core Concepts](https://llamastack.github.io/docs/concepts) - Deep dive into Llama Stack architecture, API stability, and resource management. | ||
| - [Llama Stack GitHub Repository](https://github.com/llamastack/llama-stack) - Source code, example applications, distribution configurations, and how to add new API providers. | ||
| - [Llama Stack Example Apps](https://github.com/llamastack/llama-stack-apps/) - Official examples demonstrating how to use Llama Stack in various scenarios. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.