Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/en/llama_stack/index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
weight: 82
---
# Llama Stack

<Overview />
76 changes: 76 additions & 0 deletions docs/en/llama_stack/install.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
---
weight: 20
---

# Install Llama Stack

This document describes how to install and deploy Llama Stack Server on Kubernetes using the Llama Stack Operator.

## Upload Operator

Download the Llama Stack Operator installation file (e.g., `llama-stack-operator.alpha.ALL.v0.7.0.tgz`).

Use the violet command to publish to the platform repository:

```bash
violet push --platform-address=platform-access-address --platform-username=platform-admin --platform-password=platform-admin-password llama-stack-operator.alpha.ALL.v0.7.0.tgz
```

## Install Operator

1. Go to the `Administrator` view in the Alauda Container Platform.

2. In the left navigation, select `Marketplace` / `Operator Hub`.

3. In the right panel, find `Alauda build of Llama Stack` and click `Install`.

4. Keep all parameters as default and complete the installation.

## Deploy Llama Stack Server

After the operator is installed, deploy Llama Stack Server by creating a `LlamaStackDistribution` custom resource:

> **Note:** Prepare the following in advance; otherwise the distribution may not become ready:
> - **Secret**: Create a Secret (e.g., `deepseek-api`) in the same namespace with the LLM API token. Example: `kubectl create secret generic deepseek-api -n default --from-literal=token=<LLM_API_KEY>`.
> - **Storage Class**: Ensure the `default` Storage Class exists in the cluster; otherwise the PVC cannot be bound and the resource will not become ready.

```yaml
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
annotations:
cpaas.io/display-name: ""
name: demo
namespace: default
spec:
network:
exposeRoute: false # Whether to expose the route externally
replicas: 1 # Number of server replicas
server:
containerSpec:
env:
- name: VLLM_URL
value: "https://api.deepseek.com/v1" # URL of the LLM API provider
- name: VLLM_MAX_TOKENS
value: "8192" # Maximum output tokens
- name: VLLM_API_TOKEN # Load LLM API token from secret
valueFrom:
secretKeyRef: # Create this Secret in the same namespace beforehand, e.g. kubectl create secret generic deepseek-api -n default --from-literal=token=<LLM_API_KEY>
key: token
name: deepseek-api
name: llama-stack
port: 8321
distribution:
name: starter # Distribution name (options: starter, postgres-demo, meta-reference-gpu)
storage:
mountPath: /home/lls/.lls
size: 20Gi # Requires the "default" Storage Class to be configured beforehand
```

After deployment, the Llama Stack Server will be available within the cluster. The access URL is displayed in `status.serviceURL`, for example:

```yaml
status:
phase: Ready
serviceURL: http://demo-service.default.svc.cluster.local:8321
```
29 changes: 29 additions & 0 deletions docs/en/llama_stack/overview/features.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
weight: 20
---

# Main Features

## Server-Based Architecture

- **Centralized Server**: Llama Stack Server hosts inference, agents, safety, tool runtime, vector I/O, and files
- **Remote or Inline Providers**: Support for remote APIs (e.g., OpenAI-compatible) and inline providers (e.g., meta-reference, sqlite-vec, localfs)
- **Kubernetes Deployment**: Deploy via Llama Stack Operator using `LlamaStackDistribution` custom resources

## AI Agents with Tools

- **Agent Creation**: Create agents with model, instructions, and a list of tools
- **Client-Side Tools**: Define tools with the `@client_tool` decorator; the client executes tool calls and returns results to the server
- **Session Management**: Create sessions and run multi-turn conversations with streaming responses
- **Streaming**: Support for streaming agent responses for real-time display

## Configuration and Extensibility

- **Stack Configuration**: YAML-based configuration for APIs, providers, persistence (e.g., kv_default, sql_default), and models
- **Environment Fallbacks**: Use `${env.VAR:~default}` in config for flexible deployment
- **Multiple Distributions**: Starter, postgres-demo, meta-reference-gpu and other distribution options

## Integration

- **Python Client**: `llama-stack-client` for Python 3.12+ with full agent and model APIs
- **REST-Friendly**: Server exposes APIs for inference, agents, and tool runtime; can be wrapped in FastAPI or other web frameworks for production use
7 changes: 7 additions & 0 deletions docs/en/llama_stack/overview/index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
weight: 10
---

# Overview

<Overview />
28 changes: 28 additions & 0 deletions docs/en/llama_stack/overview/intro.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
weight: 10
---
# Introduction

## Llama Stack

*Llama Stack* is a framework for building and running AI agents with tools. It provides a server-based architecture that enables developers to create agents that can interact with users, access external tools, and perform complex reasoning tasks.

Main components and concepts include:

- **Llama Stack Server**: Central service that hosts models, agents, and tool runtime. It can be deployed on Kubernetes via the Llama Stack Operator (see [Install Llama Stack](/en/llama_stack/install)).
- **Client SDK** (`llama-stack-client`): Python client for connecting to the server, creating agents, defining tools with the `@client_tool` decorator, and managing sessions.
- **Agents**: Configurable AI agents that use LLM models and can call tools (e.g., weather API, custom APIs) to answer user queries.
- **Tools**: Functions exposed to the agent (e.g., weather query). Defined with `@client_tool` and passed to the agent at creation time.
- **Configuration**: YAML stack configuration defines providers (inference, agents, safety, vector_io, files), persistence backends, and model registration (e.g., DeepSeek via OpenAI-compatible API).

Llama Stack supports multiple API providers, storage and persistence backends, and distribution options (e.g., starter, postgres-demo, meta-reference-gpu), making it suitable for quick experiments and production deployments.

## Documentation

Llama Stack provides official documentation and resources for in-depth usage:

### Official Documentation
- **Main Documentation**: [https://llamastack.github.io/docs](https://llamastack.github.io/docs)
- Usage, API providers, and core concepts
- **Core Concepts**: [https://llamastack.github.io/docs/concepts](https://llamastack.github.io/docs/concepts)
- Architecture, API stability, and resource management
78 changes: 78 additions & 0 deletions docs/en/llama_stack/quickstart.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
---
weight: 30
---

# Quickstart

This section provides a quickstart example for creating an AI Agent with Llama Stack.

## Prerequisites

- Python 3.12 or higher (if not satisfied, refer to [FAQ: How to prepare Python 3.12 in Notebook](#how-to-prepare-python-312-in-notebook))
- Llama Stack Server installed and running via Operator (see [Install Llama Stack](./install))
- Access to a Notebook environment (e.g., Jupyter Notebook, JupyterLab)
- Python environment with `llama-stack-client` and required dependencies installed
- API key for the LLM provider (e.g., DeepSeek API key)

## Quickstart Example

A simple example of creating an AI Agent with Llama Stack is available in the following resources:

- **Notebook**:[Llama Stack Quick Start Demo](/llama-stack/llama-stack_quickstart.ipynb)

Download the notebook and upload it to a Notebook environment to run.

The notebook demonstrates:

- Connecting to Llama Stack Server and client setup
- Tool definition using the `@client_tool` decorator (weather query tool example)
- Client connection to Llama Stack Server
- Model selection and Agent creation with tools and instructions
- Agent execution with session management and streaming responses
- Result handling and display
- Optional FastAPI deployment example

## FAQ

### How to prepare Python 3.12 in Notebook

1. Download the pre-compiled Python installation package:

```bash
wget -O /tmp/python312.tar.gz https://github.com/astral-sh/python-build-standalone/releases/download/20260114/cpython-3.12.12+20260114-x86_64-unknown-linux-gnu-install_only.tar.gz
```

2. Extract with:

```bash
mkdir -p ~/python312
tar -xzf /tmp/python312.tar.gz -C ~/python312 --strip-components=1
```

3. Install and Register Kernel:

```bash
export PATH="${HOME}/python312/bin:${PATH}"

python3 -m pip install ipykernel
python3 -m ipykernel install --user --name python312 --display-name "Python 3.12"
```

4. Switch kernel in the notebook page:

- Open your Notebook environment (e.g., Jupyter Notebook or JupyterLab) in the browser, then open an existing notebook or create a new one.
- In the notebook interface, find the current kernel name (usually shown in the **top-right corner** of the page, e.g., "Python 3" or "python3").
- Click that kernel name, or use the menu **Kernel → Change Kernel**.
- In the kernel list, select **"Python 3.12"** (the display name registered in step 3).
- After switching, new cells will run with Python 3.12.

**Note**: When executing python and pip commands directly in the notebook page, the default python will still be used. You need to specify the full path to use the python312 version commands.

## Additional Resources

For more resources on developing AI Agents with Llama Stack, see:

- [Llama Stack Documentation](https://llamastack.github.io/docs) - The official Llama Stack documentation covering all usage-related topics, API providers, and core concepts.
- [Llama Stack Core Concepts](https://llamastack.github.io/docs/concepts) - Deep dive into Llama Stack architecture, API stability, and resource management.
- [Llama Stack GitHub Repository](https://github.com/llamastack/llama-stack) - Source code, example applications, distribution configurations, and how to add new API providers.
- [Llama Stack Example Apps](https://github.com/llamastack/llama-stack-apps/) - Official examples demonstrating how to use Llama Stack in various scenarios.
Loading