Skip to content

Conversation

@ntamas92
Copy link
Contributor

No description provided.

Comment on lines +149 to +151
if status["status"] != "Completed":
raise JobError(status, self)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why raise a JobError if the job is not completed? perhaps its still running?

Copy link
Contributor Author

@ntamas92 ntamas92 Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought process was that the usage pattern would be the following:

export_job = dataset.export_embeddings()
export_job.sleep_until_complete(False)
result = export_job.result_urls()

We could just wait for the result urls inside result_urls() also, but then I'd highlight it somehow that obtaining the results could run for a long time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, that makes sense, didn't noticed the AsyncJob inheritence.
This is a neat idea, to have customized job result classes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add a wait_for_completion parameter. It might even be the default to wait for the job to complete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, let's do that

Copy link
Contributor

@jean-lucas jean-lucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Copy link
Contributor

@gatli gatli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good to me! Let's address the instantiation of the EmbeddingsExportJob (and for that matter any AsyncJob) such that we can let people trigger this in one process and poll in another.

poetry run black --check .
- run:
name: Ruff Lint Check # See pyproject.tooml [tool.ruff]
name: Ruff Lint Check # See pyproject.toml [tool.ruff]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merci 🙏

Comment on lines +149 to +151
if status["status"] != "Completed":
raise JobError(status, self)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add a wait_for_completion parameter. It might even be the default to wait for the job to complete.

Comment on lines 134 to 135
class EmbeddingsExportJob(AsyncJob):
def result_urls(self) -> List[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering how you would instantiate this in another process. I think we need a classmethod from_id that would allow you to spin this up in one environment and then poll in another just from the job_id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, that would be used through the NucleusClient.list_jobs method though right? So something like this:

jobs = NucleusClient.list_jobs()
export_job = EmbeddingsExportJob.from_job_id(jobs[0].job_id)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a from_id to the AsyncJob, but I couldn't make it more typesafe (e.g. client argument is) still inferred as any. Do you have any ideas on how to improve?

@gatli
Copy link
Contributor

gatli commented Sep 19, 2023

image We really need to address this flakiness 😓

@ntamas92
Copy link
Contributor Author

image We really need to address this flakiness 😓

Even more so now that it hides actually failing tests, tests were failing because of my changes, and I didn't realize, I just assumed they are just flaky 🤦

@ntamas92 ntamas92 merged commit cb10ec6 into master Sep 21, 2023
@ntamas92 ntamas92 deleted the tamasn/async-embeddings-export branch September 21, 2023 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants