microsoft · MaanavD · Apr 1, 2025 · Apr 1, 2025
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,33 @@
+# Project
+
+> This repo has been populated by an initial template to help get you started. Please
+> make sure to update the content to build a great experience for community-building.
+
+As the maintainer of this project, please make a few updates:
+
+- Improving this README.MD file to provide a great experience
+- Updating SUPPORT.MD with content about this project's support experience
+- Understanding the security reporting process in SECURITY.MD
+- Remove this section from the README
+
+## Contributing
+
+This project welcomes contributions and suggestions.  Most contributions require you to agree to a
+Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
+the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
+
+When you submit a pull request, a CLA bot will automatically determine whether you need to provide
+a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
+provided by the bot. You will only need to do this once across all repos using our CLA.
+
+This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
+For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
+contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
+
+## Trademarks
+
+This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft 
+trademarks or logos is subject to and must follow 
+[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
+Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
+Any use of third-party trademarks or logos are subject to those third-party's policies.
diff --git a/README.md b/README.md
@@ -1,33 +1,129 @@
-# Project
+---
+title: Get started with AI Foundry Local
+titleSuffix: AI Foundry Local
+description: Learn how to install, configure, and run your first AI model with AI Foundry Local
+manager: scottpolly
+keywords: Azure AI services, cognitive, AI models, local inference
+ms.service: azure-ai-foundry
+ms.topic: quickstart
+ms.date: 02/20/2025
+ms.reviewer: samkemp
+ms.author: samkemp
+author: samuel100
+ms.custom: build-2025
+#customer intent: As a developer, I want to get started with AI Foundry Local so that I can run AI models locally.
+---
 
-> This repo has been populated by an initial template to help get you started. Please
-> make sure to update the content to build a great experience for community-building.
+# Get started with AI Foundry Local
 
-As the maintainer of this project, please make a few updates:
+This article shows you how to get started with AI Foundry Local to run AI models on your device. Follow these steps to install the tool, discover available models, and run your first local AI model.
 
-- Improving this README.MD file to provide a great experience
-- Updating SUPPORT.MD with content about this project's support experience
-- Understanding the security reporting process in SECURITY.MD
-- Remove this section from the README
+## Prerequisites
 
-## Contributing
+- A PC with sufficient specifications to run AI models locally
+  - Windows 10 or later
+  - Greater than 8GB RAM
+  - Greater than 3GB of free disk space for model caching (quantized Phi 3.2 models are ~3GB)
+- Suggested hardware for optimal performance:
+  - Windows 11
+  - NVIDIA GPU (2000 series or newer) OR AMD GPU (6000 series or newer) OR Qualcomm Snapdragon X Elite, with 8GB or more of VRAM
+  - Greater than 16GB RAM
+  - Greater than 15GB of free disk space for model caching (the largest models are ~15GB)
+- Administrator access to install software
 
-This project welcomes contributions and suggestions.  Most contributions require you to agree to a
-Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
-the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
+## Quickstart in 2-steps
 
-When you submit a pull request, a CLA bot will automatically determine whether you need to provide
-a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
-provided by the bot. You will only need to do this once across all repos using our CLA.
+Follow these steps to get started with AI Foundry Local:
 
-This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
-For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
-contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
+1. **Install Foundry Local**
 
-## Trademarks
+   1. Download AI Foundry Local for your platform (Windows, MacOS, Linux - x64/ARM) from the repository's releases page.
+   2. Install the package by following the on-screen prompts.
 
-This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft 
-trademarks or logos is subject to and must follow 
-[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
-Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
-Any use of third-party trademarks or logos are subject to those third-party's policies.
+      **IMPORTANT: For MacOS/Linux users:** Run both components in separate terminals:
+
+      - Neutron Server (`Inference.Service.Agent`) - Use `chmod +x Inference.Service.Agent` to make executable
+      - Foundry Client (`foundry`) - Use `chmod +x foundry` to make executable, and add to your PATH
+
+   3. After installation, access the tool via command line with `foundry`.
+
+2. **Run your first model**
+   1. Open a command prompt or terminal window.
+   2. Run the DeepSeek-R1 model on the CPU using the following command:
+      ```bash
+      foundry model run deepseek-r1-1.5b-cpu
+      ```
+
+**💡 TIP:** The `foundry model run <model>` command will automatically download the model if it is not already cached on your local machine, and then start an interactive chat session with the model. You're encouraged to try out different models by replacing `deepseek-r1-1.5b-cpu` with the name of any other model available in the catalog, located with the `foundry model list` command.
+
+## Explore Foundry Local CLI commands
+
+The foundry CLI is structured into several categories:
+
+- **Model**: Commands related to managing and running models
+- **Service**: Commands for managing the AI Foundry Local service
+- **Cache**: Commands for managing the local cache where models are stored
+
+To see all available commands, use the help option:
+
+```bash
+foundry --help
+```
+
+**💡 TIP:** For a complete reference of all available CLI commands and their usage, see the [Foundry Local CLI Reference](./reference/reference-cli.md)
+
+## Security and privacy considerations
+
+AI Foundry Local is designed with privacy and security as core principles:
+
+- **Local processing**: All data processed by AI Foundry Local remains on your device and is never sent to Microsoft or any external services.
+- **No telemetry**: AI Foundry Local does not collect usage data or model inputs.
+- **Air-gapped environments**: AI Foundry Local can be used in disconnected environments after initial model download.
+
+### Security best practices
+
+- Use AI Foundry Local in environments that align with your organization's security policies.
+- For handling sensitive data, ensure your device meets your organization's security requirements.
+- Consider disk encryption for devices where cached models might contain sensitive fine-tuning data.
+
+### Licensing considerations
+
+Models available through AI Foundry Local are subject to their original licenses:
+
+- Open-source models maintain their original licenses (e.g., Apache 2.0, MIT).
+- Commercial models may have specific usage restrictions or require separate licensing.
+- Always review the licensing information for each model before deploying in production.
+
+## Production deployment scope
+
+AI Foundry Local is designed primarily for:
+
+- Individual developer workstations
+- Single-node deployment
+- Local application development and testing
+
+**⚠️ IMPORTANT:** AI Foundry Local is not currently intended for distributed, containerized, or multi-machine production deployment. For production-scale deployment needs, consider Azure AI Foundry for enterprise-grade availability and scale.
+
+## Troubleshooting
+
+### Common issues and solutions
+
+| Issue                   | Possible Cause                          | Solution                                                                                  |
+| ----------------------- | --------------------------------------- | ----------------------------------------------------------------------------------------- |
+| Slow inference          | CPU-only model on large parameter count | Use GPU-optimized model variants when available                                           |
+| Model download failures | Network connectivity issues             | Check your internet connection, try `foundry cache list` to verify cache state            |
+| Service won't start     | Port conflicts or permission issues     | Try `foundry service restart` or post an issue providing logs with `foundry zip-logsrock` |
+
+### Diagnosing performance issues
+
+If you're experiencing slow inference:
+
+1. Check that you're using GPU acceleration if available
+2. Monitor memory usage during inference to detect bottlenecks
+3. Consider a more quantized model variant (e.g., INT8 instead of FP16)
+4. Experiment with batch sizes for non-interactive workloads
+
+## Next steps
+
+- [Learn how to integrate AI Foundry Local with your applications](./how-to/integrate-with-inference-sdks.md)
+- [Explore the AI Foundry Local documentation](./index.yml)
diff --git a/concepts/foundry-local-architecture.md b/concepts/foundry-local-architecture.md
@@ -0,0 +1,128 @@
+---
+title: Foundry Local Architecture
+titleSuffix: AI Foundry Local
+description: This article articulates the Foundry Local architecture
+manager: scottpolly
+ms.service: azure-ai-foundry
+ms.custom: build-2025
+ms.topic: concept-article
+ms.date: 02/12/2025
+ms.author: samkemp
+author: samuel100
+---
+
+# Foundry Local Architecture
+
+Foundry Local is designed to enable efficient, secure, and scalable AI model inference directly on local devices. This article explains the key components of the Foundry Local architecture and how they interact to deliver AI capabilities.
+
+The benefits of Foundry Local include:
+
+- **Low Latency**: By running models locally, Foundry Local minimizes the time it takes to process requests and return results.
+- **Data Privacy**: Sensitive data can be processed locally without sending it to the cloud, ensuring compliance with data protection regulations.
+- **Flexibility**: Foundry Local supports a wide range of hardware configurations, allowing users to choose the best setup for their needs.
+- **Scalability**: Foundry Local can be deployed on various devices, from personal computers to powerful servers, making it suitable for different use cases.
+- **Cost-Effectiveness**: Running models locally can reduce costs associated with cloud computing, especially for high-volume applications.
+- **Offline Capabilities**: Foundry Local can operate without an internet connection, making it ideal for remote or disconnected environments.
+- **Integration with Existing Workflows**: Foundry Local can be easily integrated into existing development and deployment workflows, allowing for a smooth transition to local inference.
+
+## Key Components
+
+The key components of the Foundry Local architecture are articulated in the following diagram:
+
+![Foundry Local Architecture Diagram](../media/architecture/foundry-local-arch.png)
+
+### Foundry Local Service
+
+The Foundry Local Service is an OpenAI compatible REST server that provides a standardized interface for interacting with the inference engine and model management. Developers can use this API to send requests, run models, and retrieve results programmatically.
+
+- **Endpoint**: `http://localhost:5272/v1`
+- **Use Cases**:
+  - Integrating Foundry Local with custom applications.
+  - Running models via HTTP requests.
+
+### ONNX Runtime
+
+The ONNX runtime is a core component responsible for running AI models. It uses optimized ONNX models to perform inference efficiently on local hardware, such as CPUs, GPUs, or NPUs.
+
+**Features**:
+
+- Supports multiple hardware providers (for example: NVIDIA, AMD, Intel) and devices (for example: NPUs, CPUs, GPUs).
+- Provides a unified interface for running models on different hardware platforms.
+- Best-in-class performance.
+- Supports quantized models for faster inference.
+
+### Model Management
+
+Foundry Local provides robust tools for managing AI models, ensuring that they're readily available for inference and easy to maintain. Model management is handled through the **Model Cache** and the **Command-Line Interface (CLI)**.
+
+#### Model Cache
+
+The model cache is a local storage system where AI models are downloaded and stored. It ensures that models are available for inference without requiring repeated downloads. The cache can be managed using the Foundry CLI or REST API.
+
+- **Purpose**: Reduces latency by storing models locally.
+- **Management Commands**:
+  - `foundry cache list`: Lists all models stored in the local cache.
+  - `foundry cache remove <model-name>`: Deletes a specific model from the cache.
+  - `foundry cache cd <path>`: Changes the directory where models are stored.
+
+#### Model Lifecycle
+
+1. **Download**: Models are downloaded from the Azure AI Foundry model catalog to local disk.
+2. **Load**: Models are loaded into the Foundry Local service (and therefore memory) for inference. You can set a TTL (time-to-live) for how long the model should remain in memory (the default is 10 minutes).
+3. **Run**: Models are inferenced.
+4. **Unload**: Models can be unloaded from the inference engine to free up resources.
+5. **Delete**: Models can be deleted from the local cache to free up disk space.
+
+#### Model Compilation using Olive
+
+Before models can be used with Foundry Local, they must be compiled and optimized in the [ONNX](https://onnx.ai) format. Microsoft provides a selection of published models in the Azure AI Foundry Model Catalog that are already optimized for Foundry Local. However, you aren't limited to those models - by using [Olive](https://microsoft.github.io/Olive/). Olive is a powerful framework for preparing AI models for efficient inference. It converts models into the ONNX format, optimizes their graph structure, and applies techniques like quantization to improve performance on local hardware.
+
+**💡 TIP**: To learn more about compiling models for Foundry Local, read [Compile Hugging Face models for Foundry Local](../how-to/compile-models-for-foundry-local.md).
+
+### Hardware Abstraction Layer
+
+The hardware abstraction layer ensures that Foundry Local can run on various devices by abstracting the underlying hardware. To optimize performance based on the available hardware, Foundry Local supports:
+
+- **multiple _execution providers_**, such as NVIDIA CUDA, AMD, Qualcomm, Intel.
+- **multiple _device types_**, such as CPU, GPU, NPU.
+
+### Developer Experiences
+
+The Foundry Local architecture is designed to provide a seamless developer experience, enabling easy integration and interaction with AI models.
+
+Developers can choose from various interfaces to interact with the system, including:
+
+#### Command-Line Interface (CLI)
+
+The Foundry CLI is a powerful tool for managing models, the inference engine, and the local cache.
+
+**Examples**:
+
+- `foundry model list`: Lists all available models in the local cache.
+- `foundry model run <model-name>`: Runs a model.
+- `foundry service status`: Checks the status of the service.
+
+**💡 TIP**: To learn more about the CLI commands, read [Foundry Local CLI Reference](../reference/reference-cli.md).
+
+#### Inferencing SDK Integration
+
+Foundry Local supports integration with various SDKs, such as the OpenAI SDK, enabling developers to use familiar programming interfaces to interact with the local inference engine.
+
+- **Supported SDKs**: Python, JavaScript, C#, and more.
+
+**💡 TIP**: To learn more about integrating with inferencing SDKs, read [Integrate Foundry Local with Inferencing SDKs](../how-to/integrate-with-inference-sdks.md).
+
+#### AI Toolkit for Visual Studio Code
+
+The AI Toolkit for Visual Studio Code provides a user-friendly interface for developers to interact with Foundry Local. It allows users to run models, manage the local cache, and visualize results directly within the IDE.
+
+- **Features**:
+  - Model management: Download, load, and run models from within the IDE.
+  - Interactive console: Send requests and view responses in real-time.
+  - Visualization tools: Graphical representation of model performance and results.
+
+## Next Steps
+
+- [Get started with AI Foundry Local](../get-started.md)
+- [Integrate with Inference SDKs](../how-to/integrate-with-inference-sdks.md)
+- [Foundry Local CLI Reference](../reference/reference-cli.md)