-
Notifications
You must be signed in to change notification settings - Fork 11
Add support for asynchronous embeddings export #394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
95ab42a
adbdb57
4374f9b
47af8e5
4bd86a1
76c00a8
3bdcd37
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -119,6 +119,20 @@ def sleep_until_complete(self, verbose_std_out=True): | |
| if final_status["status"] == "Errored": | ||
| raise JobError(final_status, self) | ||
|
|
||
| @classmethod | ||
| def from_id(cls, job_id: str, client: "NucleusClient"): # type: ignore # noqa: F821 | ||
| """Creates a job instance from a specific job Id. | ||
|
|
||
| Parameters: | ||
| job_id: Defines the job Id | ||
| client: The client to use for the request. | ||
|
|
||
| Returns: | ||
| The specific AsyncMethod (or inherited) instance. | ||
| """ | ||
| job = client.get_job(job_id) | ||
| return cls.from_json(job.__dict__, client) | ||
|
|
||
| @classmethod | ||
| def from_json(cls, payload: dict, client): | ||
| # TODO: make private | ||
|
|
@@ -131,6 +145,34 @@ def from_json(cls, payload: dict, client): | |
| ) | ||
|
|
||
|
|
||
| class EmbeddingsExportJob(AsyncJob): | ||
| def result_urls(self, wait_for_completion=True) -> List[str]: | ||
| """Gets a list of signed Scale URLs for each embedding batch. | ||
|
|
||
| Parameters: | ||
| wait_for_completion: Defines whether the call shall wait for | ||
| the job to complete. Defaults to True | ||
|
|
||
| Returns: | ||
| A list of signed Scale URLs which contain batches of embeddings. | ||
|
|
||
| The files contain a JSON array of embedding records with the following schema: | ||
| [{ | ||
| "reference_id": str, | ||
| "embedding_vector": List[float] | ||
| }] | ||
| """ | ||
| if wait_for_completion: | ||
| self.sleep_until_complete(verbose_std_out=False) | ||
|
|
||
| status = self.status() | ||
|
|
||
| if status["status"] != "Completed": | ||
| raise JobError(status, self) | ||
|
|
||
|
Comment on lines
+170
to
+172
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why raise a
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My thought process was that the usage pattern would be the following: export_job = dataset.export_embeddings()
export_job.sleep_until_complete(False)
result = export_job.result_urls()We could just wait for the result urls inside
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alright, that makes sense, didn't noticed the AsyncJob inheritence.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we should add a
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good idea, let's do that |
||
| return status["message"]["result"] # type: ignore | ||
|
|
||
|
|
||
| class JobError(Exception): | ||
| def __init__(self, job_status: Dict[str, str], job: AsyncJob): | ||
| final_status_message = job_status["message"] | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merci 🙏