From 329262bdf7cfdd9bdf29c9adbee170027bc592dc Mon Sep 17 00:00:00 2001 From: cbh123 Date: Fri, 19 Apr 2024 12:31:22 -0700 Subject: [PATCH 1/2] async example, and link documentation --- README.md | 35 ++++++++++++++++++++++------------- 1 file changed, 22 insertions(+), 13 deletions(-) diff --git a/README.md b/README.md index e22fc9b0..a44e19af 100644 --- a/README.md +++ b/README.md @@ -42,14 +42,24 @@ Create a new Python file and add the following code, replacing the model identif ['https://replicate.com/api/models/stability-ai/stable-diffusion/files/50fcac81-865d-499e-81ac-49de0cb79264/out-0.png'] ``` -Some models, particularly language models, may not require the version string. Refer to the API documentation for the model for more on the specifics: +Some models, particularly language models, may not require the version string. You can always refer to the API documentation on the model page for specifics (for example, [check out the Llama 3 API documentation](https://replicate.com/meta/meta-llama-3-70b-instruct/api)). ```python replicate.run( - "meta/llama-2-70b-chat", + "meta/meta-llama-3-70b-instruct", input={ - "prompt": "Can you write a poem about open source machine learning?", - "system_prompt": "You are a helpful, respectful and honest assistant.", + "prompt": "Can you write a poem about open source machine learning?" + }, +) +``` + +Here is the async equivalent of the above: + +```python +replicate.models.predictions.create( + "meta/meta-llama-3-70b-instruct", + input={ + "prompt": "Can you write a poem about open source machine learning?" }, ) ``` @@ -69,14 +79,14 @@ Or, for smaller files (<10MB), you can pass a file handle directly. ``` > [!NOTE] -> You can also use the Replicate client asynchronously by prepending `async_` to the method name. -> +> You can also use the Replicate client asynchronously by prepending `async_` to the method name. +> > Here's an example of how to run several predictions concurrently and wait for them all to complete: > > ```python > import asyncio > import replicate -> +> > # https://replicate.com/stability-ai/sdxl > model_version = "stability-ai/sdxl:39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b" > prompts = [ @@ -96,7 +106,7 @@ Or, for smaller files (<10MB), you can pass a file handle directly. ## Run a model and stream its output -Replicate’s API supports server-sent event streams (SSEs) for language models. +Replicate’s API supports server-sent event streams (SSEs) for language models. Use the `stream` method to consume tokens as they're produced by the model. ```python @@ -132,7 +142,6 @@ for event in prediction.stream(): For more information, see ["Streaming output"](https://replicate.com/docs/streaming) in Replicate's docs. - ## Run a model in the background You can start a model and run it in the background: @@ -337,12 +346,12 @@ Here's how to list of all the available hardware for running models on Replicate ## Fine-tune a model -Use the [training API](https://replicate.com/docs/fine-tuning) -to fine-tune models to make them better at a particular task. -To see what **language models** currently support fine-tuning, +Use the [training API](https://replicate.com/docs/fine-tuning) +to fine-tune models to make them better at a particular task. +To see what **language models** currently support fine-tuning, check out Replicate's [collection of trainable language models](https://replicate.com/collections/trainable-language-models). -If you're looking to fine-tune **image models**, +If you're looking to fine-tune **image models**, check out Replicate's [guide to fine-tuning image models](https://replicate.com/docs/guides/fine-tune-an-image-model). Here's how to fine-tune a model on Replicate: From bab6f5a8e5cccca8713ac2b9ccc14e180d3835bc Mon Sep 17 00:00:00 2001 From: Mattt Zmuda Date: Fri, 28 Jun 2024 05:58:55 -0700 Subject: [PATCH 2/2] Reorganize and reword discussion of running models Signed-off-by: Mattt Zmuda --- README.md | 37 ++++++++----------------------------- 1 file changed, 8 insertions(+), 29 deletions(-) diff --git a/README.md b/README.md index a44e19af..6d8df947 100644 --- a/README.md +++ b/README.md @@ -42,28 +42,6 @@ Create a new Python file and add the following code, replacing the model identif ['https://replicate.com/api/models/stability-ai/stable-diffusion/files/50fcac81-865d-499e-81ac-49de0cb79264/out-0.png'] ``` -Some models, particularly language models, may not require the version string. You can always refer to the API documentation on the model page for specifics (for example, [check out the Llama 3 API documentation](https://replicate.com/meta/meta-llama-3-70b-instruct/api)). - -```python -replicate.run( - "meta/meta-llama-3-70b-instruct", - input={ - "prompt": "Can you write a poem about open source machine learning?" - }, -) -``` - -Here is the async equivalent of the above: - -```python -replicate.models.predictions.create( - "meta/meta-llama-3-70b-instruct", - input={ - "prompt": "Can you write a poem about open source machine learning?" - }, -) -``` - Some models, like [andreasjansson/blip-2](https://replicate.com/andreasjansson/blip-2), have files as inputs. To run a model that takes a file input, pass a URL to a publicly accessible file. @@ -107,16 +85,13 @@ Or, for smaller files (<10MB), you can pass a file handle directly. ## Run a model and stream its output Replicate’s API supports server-sent event streams (SSEs) for language models. -Use the `stream` method to consume tokens as they're produced by the model. +Use the `stream` method to consume tokens as they're produced. ```python import replicate -# https://replicate.com/meta/llama-2-70b-chat -model_version = "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3" - for event in replicate.stream( - model_version, + "meta/meta-llama-3-70b-instruct, input={ "prompt": "Please write a haiku about llamas.", }, @@ -124,13 +99,17 @@ for event in replicate.stream( print(str(event), end="") ``` +> [!TIP] +> Some models, like [meta/meta-llama-3-70b-instruct](https://replicate.com/meta/meta-llama-3-70b-instruct), +> don't require a version string. +> You can always refer to the API documentation on the model page for specifics. + You can also stream the output of a prediction you create. This is helpful when you want the ID of the prediction separate from its output. ```python -version = "02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3" prediction = replicate.predictions.create( - version=version, + model="meta/meta-llama-3-70b-instruct", input={"prompt": "Please write a haiku about llamas."}, stream=True, )