From 329262bdf7cfdd9bdf29c9adbee170027bc592dc Mon Sep 17 00:00:00 2001
From: cbh123 <charlieholtz@gmail.com>
Date: Fri, 19 Apr 2024 12:31:22 -0700
Subject: [PATCH 1/2] async example, and link documentation

---
 README.md | 35 ++++++++++++++++++++++-------------
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/README.md b/README.md
index e22fc9b0..a44e19af 100644
--- a/README.md
+++ b/README.md
@@ -42,14 +42,24 @@ Create a new Python file and add the following code, replacing the model identif
 ['https://replicate.com/api/models/stability-ai/stable-diffusion/files/50fcac81-865d-499e-81ac-49de0cb79264/out-0.png']
 ```
 
-Some models, particularly language models, may not require the version string. Refer to the API documentation for the model for more on the specifics:
+Some models, particularly language models, may not require the version string. You can always refer to the API documentation on the model page for specifics (for example, [check out the Llama 3 API documentation](https://replicate.com/meta/meta-llama-3-70b-instruct/api)).
 
 ```python
 replicate.run(
-    "meta/llama-2-70b-chat",
+    "meta/meta-llama-3-70b-instruct",
     input={
-        "prompt": "Can you write a poem about open source machine learning?",
-        "system_prompt": "You are a helpful, respectful and honest assistant.",
+        "prompt": "Can you write a poem about open source machine learning?"
+    },
+)
+```
+
+Here is the async equivalent of the above:
+
+```python
+replicate.models.predictions.create(
+    "meta/meta-llama-3-70b-instruct",
+    input={
+        "prompt": "Can you write a poem about open source machine learning?"
     },
 )
 ```
@@ -69,14 +79,14 @@ Or, for smaller files (<10MB), you can pass a file handle directly.
 ```
 
 > [!NOTE]
-> You can also use the Replicate client asynchronously by prepending `async_` to the method name. 
-> 
+> You can also use the Replicate client asynchronously by prepending `async_` to the method name.
+>
 > Here's an example of how to run several predictions concurrently and wait for them all to complete:
 >
 > ```python
 > import asyncio
 > import replicate
-> 
+>
 > # https://replicate.com/stability-ai/sdxl
 > model_version = "stability-ai/sdxl:39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b"
 > prompts = [
@@ -96,7 +106,7 @@ Or, for smaller files (<10MB), you can pass a file handle directly.
 
 ## Run a model and stream its output
 
-Replicate’s API supports server-sent event streams (SSEs) for language models. 
+Replicate’s API supports server-sent event streams (SSEs) for language models.
 Use the `stream` method to consume tokens as they're produced by the model.
 
 ```python
@@ -132,7 +142,6 @@ for event in prediction.stream():
 For more information, see
 ["Streaming output"](https://replicate.com/docs/streaming) in Replicate's docs.
 
-
 ## Run a model in the background
 
 You can start a model and run it in the background:
@@ -337,12 +346,12 @@ Here's how to list of all the available hardware for running models on Replicate
 
 ## Fine-tune a model
 
-Use the [training API](https://replicate.com/docs/fine-tuning) 
-to fine-tune models to make them better at a particular task. 
-To see what **language models** currently support fine-tuning, 
+Use the [training API](https://replicate.com/docs/fine-tuning)
+to fine-tune models to make them better at a particular task.
+To see what **language models** currently support fine-tuning,
 check out Replicate's [collection of trainable language models](https://replicate.com/collections/trainable-language-models).
 
-If you're looking to fine-tune **image models**, 
+If you're looking to fine-tune **image models**,
 check out Replicate's [guide to fine-tuning image models](https://replicate.com/docs/guides/fine-tune-an-image-model).
 
 Here's how to fine-tune a model on Replicate:

From bab6f5a8e5cccca8713ac2b9ccc14e180d3835bc Mon Sep 17 00:00:00 2001
From: Mattt Zmuda <mattt@replicate.com>
Date: Fri, 28 Jun 2024 05:58:55 -0700
Subject: [PATCH 2/2] Reorganize and reword discussion of running models

Signed-off-by: Mattt Zmuda <mattt@replicate.com>
---
 README.md | 37 ++++++++-----------------------------
 1 file changed, 8 insertions(+), 29 deletions(-)

diff --git a/README.md b/README.md
index a44e19af..6d8df947 100644
--- a/README.md
+++ b/README.md
@@ -42,28 +42,6 @@ Create a new Python file and add the following code, replacing the model identif
 ['https://replicate.com/api/models/stability-ai/stable-diffusion/files/50fcac81-865d-499e-81ac-49de0cb79264/out-0.png']
 ```
 
-Some models, particularly language models, may not require the version string. You can always refer to the API documentation on the model page for specifics (for example, [check out the Llama 3 API documentation](https://replicate.com/meta/meta-llama-3-70b-instruct/api)).
-
-```python
-replicate.run(
-    "meta/meta-llama-3-70b-instruct",
-    input={
-        "prompt": "Can you write a poem about open source machine learning?"
-    },
-)
-```
-
-Here is the async equivalent of the above:
-
-```python
-replicate.models.predictions.create(
-    "meta/meta-llama-3-70b-instruct",
-    input={
-        "prompt": "Can you write a poem about open source machine learning?"
-    },
-)
-```
-
 Some models, like [andreasjansson/blip-2](https://replicate.com/andreasjansson/blip-2), have files as inputs.
 To run a model that takes a file input,
 pass a URL to a publicly accessible file.
@@ -107,16 +85,13 @@ Or, for smaller files (<10MB), you can pass a file handle directly.
 ## Run a model and stream its output
 
 Replicate’s API supports server-sent event streams (SSEs) for language models.
-Use the `stream` method to consume tokens as they're produced by the model.
+Use the `stream` method to consume tokens as they're produced.
 
 ```python
 import replicate
 
-# https://replicate.com/meta/llama-2-70b-chat
-model_version = "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3"
-
 for event in replicate.stream(
-    model_version,
+    "meta/meta-llama-3-70b-instruct,
     input={
         "prompt": "Please write a haiku about llamas.",
     },
@@ -124,13 +99,17 @@ for event in replicate.stream(
     print(str(event), end="")
 ```
 
+> [!TIP]
+> Some models, like [meta/meta-llama-3-70b-instruct](https://replicate.com/meta/meta-llama-3-70b-instruct), 
+> don't require a version string. 
+> You can always refer to the API documentation on the model page for specifics.
+
 You can also stream the output of a prediction you create.
 This is helpful when you want the ID of the prediction separate from its output.
 
 ```python
-version = "02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3"
 prediction = replicate.predictions.create(
-    version=version,
+    model="meta/meta-llama-3-70b-instruct",
     input={"prompt": "Please write a haiku about llamas."},
     stream=True,
 )